What do you do when your wife gets a job in Nepal and you tag along? You help to build the foundation for machine translation between Esperanto and Nepali, of course!
That’s what Jacob Nordfalk did. He was the first speaker at today’s session of Talk IT at Copenhagen Business School.
Jacob talked about working with Apertium, a free and open-source machine-translation platform. Don’t worry, translator friends, this was not a push to replace the human element! The value here is a machine translation tool that is open source and free. Participation in Apertium does require XML knowledge as well as knowledge of the languages used in the corpus, the body of electronic texts that provides the translation foundation. Jacob has even received stipends from Google Summer of Code for projects to build the corpus for Nordic languages.
Apertium does a rule-based type of translation, making it more reliable, but more boring. Jacob calls it “shallow translation”. If you enter “see the dog run”, then Apertium translates those four individual words, whereas Google Translate looks at the entire phrase as a unit. (The amount of calculation that Google Translate uses is quite mind-boggling and impressive.)
Jacob walked us through Apertium’s approach to translation, showing us how rules are applied for a simple translation from English to Esperanto. (Most languages in Apertium are Indo-European – and now Nepali – but that can change as more people contribute to this tool.)
A Pen is a Pen is a Pen
A lot of discussion came out of his simple phrase “He saw a pen”. The first step was determining what pen was – a noun or a verb. Because of its position in the phrase, it was considered a noun. Esperanto had only “plumo” as the word for pen – a writing instrument. It did not have the word for a place to hold animals and it did not have the slang word that was an abbreviation for penitentiary. Realizing how much effort had to go into calculations for such a simple phrase highlighted the overall intricacies of translation.
One person in the audience asked whether Apertium could handle irony. My personal opinion was – that’s where the human enters the equation! Jacob’s response was somewhat the same. It all depends on the context, and the human brain is needed for that.
Mozilla with a Mobile Twist
The second speaker of the day was Mike Kristoffersen fra Mozilla. Unfortunately, our many discussions during Jacob’s presentation ate into Mike’s time, so the talk was short and sweet. Mike came to tell us about Firefox for Mobile. I saw a presentation on Fennec (the code name for the project) at Reboot 11 in June 2009, so it was interesting to see how the project had developed since then.
Mike also told us about other Mozilla activities to emphasize that Mozilla was not just Firefox! He opened with Mozilla’s declaration that
We believe in the power and potential of the Internet and want to see it thrive for everyone, everywhere.
He then walked us through several of the projects Mozilla runs to promote an open internet, such as Drumbeat.
My favorite Drumbeat project is the Universal Subtitles project. This can help build text needed for the deaf (captioning) and translations for those who do not speak the language of the orginal recording. In light of this project, it was appropriate that an entire slide was dedicated to mentioning Mozilla’s support of accessibility in a statement that echoed Tim Berners-Lee. You know. This quote:
The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect.
P2PU – Peer to Peer University – is about teaching the latest open web technologies to young people and helping them to develop an attitude to succeed in open collaborative projects. P2PU made me think about the overlap with both Scrunch Up magazine for young designers and developers and WaSP InterACT. They’re not quite the same, but I think they should know about each other. Web standards and open source should be kissing cousins.
When Mike mentioned Mozilla grants, I was happy to see their support of the Ushahidi project in Chile. Ushahidi cannot get enough recognition, in my opinion. I think it is just the beginning of the great potential we will see coming from Africa in the coming decade.
I was surprised to hear that Denmark ranked lowest in the Nordic region for adopting Firefox. It goes to show how wrapped up you can become in your community. I know nerds and creative types who use Firefox, Opera, Chrome, and Safari – and all love to hate Internet Explorer. However, Internet Explorer has saturated the Danish workplace. Outside the world of geeks, people probably don’t care about browsers and are probably not even conscious of what a browser is! They just want to book a ticket, pay a bill, read the news, and so on, and they are not thinking “I am doing this thanks to my browser”. I took this bit of information as yet another reminder to never ever ever make any assumptions about how other people use computers and software! We had a nice little discussion on this point.
A mention of Mozilla Labs, where they experiment with crazy and not-so-crazy ideas, wrapped up Mike’s talk and this season of Talk IT. I look forward to the new season next Fall.
Kudos to Jesper, Rikke, and David for a second successful year of Talk IT at Copenhagen Business School.
Hi, thanks for a great blog entry!
There are some minor things which I’d like to clarify. I mostly did machine translation between Esperanto and English, not Nepali. But I did an Esperanto-Nepali dictionary (which can be found at http://www.esperanto.org.np/vortaro).
And of course Esperanto has the word for a place to hold animals (ŝafejo – see e.g. http://manybooks.net/pages/hayesc1696716967-8/111.html). It also has a word for penitentiary (pentfarejo – see e.g. http://manybooks.net/pages/hayesc1696716967-8/112.html) but not a slang word abbreviation.
The thing is, these words were omitted from my dictionary as they are quite uncommon, and therefore the risk of mis-interpreting a normal ‘pen’ as one of these uncommon ‘pen’ meaning would be too high.
Comments are closed.