TV version (Display Regular Site)

Skip to: Navigation | Content | Sidebar | Footer


Weblog Entry

Babble

November 15, 2007

One of the odd quirks of running a site available in multiple languages is that I receive email in a language other than English from time to time. Not often, but with enough frequency that it’s something I’ve had to think about.

Short of learning the language, there’s no sound way to deal with these messages. Babelfish is notoriously bad for anything beyond simple word translations. But with a bit of effort, and reasonably intelligent parties on both ends of the inbox, it can kinda-sorta work. Here are a couple of quick things I’ve tried doing to help preserve the meaning of my message.

  • I use short sentences, and simple English words, avoiding slang entirely. I try not to flex my vocabulary skills or write highly structured prose that requires an English major to decipher. I also used to avoid contractions, but it seems like most translators know how to deal with those, and I’d imagine that purposefully joining two words together could lead to less ambiguity for the translator. (ie. there’s no need to ask “does that ‘not’ belong to ‘is not’ or ‘not mine’”?)

  • I run my own words through Babelfish. If any English words remain in the output, I try and re-write to use synonyms instead. This one is particularly frustrating for the person on the other end, because they’d likely turn to Babelfish as well, and I’ve just demonstrated it can’t translate those words. They might be able to figure out my intent in context, but given grammar differences between languages, I wouldn’t rely on it.

    On the other hand, not all languages have their own words for things, and some simply use English words instead. How do I know which is which? I usually don’t.

  • Speaking of grammar, without knowing the language I can’t adequately fake its sentence structure. English words fall in a particular order, but that order doesn’t necessarily make sense in other languages. I can get a hint of how different they are by translating my words to the language and then translating that result back to English, which sometimes gives me clues on how to re-phrase my sentence.

    Here are a couple of quick round trips I did from English to Dutch and then back again:

    Yes. You have my permission to use a screenshot of the site.
    Ja. U hebt mijn toestemming om een screenshot van de plaats te gebruiken.
    Yes. You have to my authorisation use screenshot of.
    Yes. I grant permission to use a screenshot of the website.
    Ja. Ik verleen toestemming om een screenshot van de website te gebruiken.
    Yes. I grant authorisation to use screenshot of the Internet site.

    The first time tells me that “you have my” isn’t translating as well as I’d like, and the usage of “site” seems to be problematic. By changing them around, I get a result that seems to make more sense after coming back to English, and one that Faruk tells me is a passable Dutch translation to boot.

    Not to say it will always work that way, but with a bit of extra effort it does seem possible to create sentences that retain your intent.

After all this though, it’s still tempting to consider explicitly stating I used Babelfish for translations to, you know, avoid sparking international incidents.


November 16, 12h

Have you (or anyone else, for that matter) done any research as to the reliability of different translation engines? That is, is Babelfish any more/less reliable or accurate than, say, Google Language Tools (http://www.google.com/language_tools)?

2
M. says:
November 16, 12h

You should really check out translate.google.com - so far it seems to be a lot more intelligent than Babelfish

Robin says:
November 16, 12h

Add me to the list of people that use Google Translator (http://translate.google.com/). I haven’t used Babelfish much, but I find Google Translator to be a lot smarter.

4
pieter says:
November 16, 13h

The only problem with the first translation is “site”, which is translated to “plaats” (place). Looks like translation from dutch to English is worse then the other way arround.

November 16, 13h

Excellent advice. I sometimes use a web translator to help in replying to Ma.gnolia support emails that come in with non-English messages. I add the line to the translated text “I do not speak [language]. I am using an online tool to help translate”.

I’ve found Google and Babelfish about 50/50 so far. I use both on any translation job.

November 16, 14h

I *think* Google’s translation feature recently switched from using the same engine everyone else used to their own engine. I frequently send emails translated from English to Spanish and vice versa, and lately Google’s translations have been much improved. Heads over tails over the Babelfish/AltaVista/MSN/etc. services.

7
David Robarts says:
November 16, 20h

Google is experimenting with a big data/rosetta stone approach. Rather than programing translators, Google is trying to make a program that learns to translate new texts by studying texts available in multiple languages; so one would hope that their translator is continuously refining itself.

Bart says:
November 17, 02h

There’s no problem in using ‘site’ or ‘website’ in Dutch sentences. Those words are commonly used in The Netherlands. As Pieter stated yesterday, the word ‘plaats’ actually means (a physical) ‘place’ in English.

November 17, 13h

Richard Ishida’s presentation at @media 2007 which is available as slides and audio gives a good perspective on language difficulties with translation. Its worth listening to if, like me, you are too poor to make it to such events.

10
Bryan says:
November 17, 20h

I tend to use Google Translate myself and have been quite surprised at how well it’s done at times - although not perfect mind you.

I was producing a Chinese graduation album, but don’t speak a word of the languages, let along write it. Typed in a few words for headings that I needed, all of them came back correct when sent to China to check.

November 18, 15h

@Pieter, Bart:

The word “site” in English originally meant “location”, “position”, or “place”, so in, certain contexts, “plaats” is actually a perfectly valid translation!

The “places” you visit with a web browser were originally called “web sites” (i.e., places or locations on the web); this was later shortened to “sites”.

The second example uses the longer form, thereby disambiguating the word “site”, which leads to a better translation. (If the source text isn’t explicit about the type of site, the translation program would have to “know” that you can’t take a “screenshot” of a physical location –but that kind of reasoning is still a bit too sophisticated for today’s software!)

Durf says:
November 18, 18h

I would caution against using “round trips” as a gauge of anything like quality or meaning. Translation engines are fundamentally flawed, and when you go En-Ne-En you aren’t confirming the value of the first step in that process by giving the machine twice as many chances to mangle your ideas.

Your “look for English words remaining in the output and try rephrasing” is a good bit of advice, but beyond that if you’re going to rely on machine translation, rely on it once, not twice; fire and forget.

13
Erik says:
November 19, 01h

Marcel Feenstra wrote: “the translation program would have to “know” that you can’t take a “screenshot” of a physical location –but that kind of reasoning is still a bit too sophisticated for today’s software!”

Google translates the first example correct.
I think Google ‘knows’ the right translation not because it can reason, but because it uses statistics. ‘Site’ in the proximity of ‘screenshot’ usually means ‘website’, not place.

November 19, 03h

@Erik:

Yes, Google did translate the first example correctly.

However, when I tried: “This plot of land will make a good site for the school” (where site can only mean “location”, not “web site”), I got: “Dit perceel zal maken van een goede site voor de school” –so perhaps Google *always* “translates” the word “site” as “site”… ;-)

I do agree with you, though, that Google’s “knowledge” is based firmly on statistics! (I did find it interesting that they ask for human feedback through their link, “Suggest a better translation”…)

15
Nic says:
November 22, 03h

Babelfish certainly seems to handle technical term better than Google for me (German-English):

Google:
steuerspannung -> steuerspannung
steuer spannung -> Tax tension

Altavista Babelfish:
steuerspannung -> control voltage (that’s 100%)

16
pjetr says:
November 22, 03h

@ Marcel Feenstra:
“Site” is also a perfectly correct dutch word. Different pronounciation, same word; same meaning

Eric Ritz says:
November 22, 19h

Erik says, “I think Google ‘knows’ the right translation not because it can reason, but because it uses statistics. ‘Site’ in the proximity of ‘screenshot’ usually means ‘website’, not place.”

The field of Natural Language Processing is one of statistical analysis, since “teaching” a computer grammar is not realistically fesible. That being said, Google’s translator would likely have a tremendous corpus at its fingertips for refining its translations; and the bigger the corpus the better.

I have not used Google Translate, but I have used Babelfish, and its results are hit and miss–more so miss. Such web translation programs seem to fair best with Indo-European languages, especially when converting to or from English. I would be very wary of using them for languages which sharply diverge from English grammar (e.g. Japanese or Arabic).

Ace says:
November 24, 09h

I live in Belgium and speak both Dutch and English natively and I feel the need to remark that Dutch is a very difficult and inconsistent language for people trying to learn it. Heck, it’s even hard for native speakers. There are so many different spelling rules and what’s more, they change ever so often, as well! In the end nobody remembers how to spell a particular word.

Another nice thing to know about Dutch is that it’s not very disturbing to use English words, since we don’t have our own terminology for every single word – that’s certainly true when about tech-oriented stuff. Nobody minds, really. One could even go as far (as address a Dutch-speaking person (at random) in English and still be sure he or she would understand you perfectly and would most probably be able to answer you accordingly, without having too much trouble with it. At least that’s what it’s like over here in Belgium. I think it’s because we are used to picking up languages (our national languages are Dutch, French, German and English). Over here, it’s compulsory to have mastered all of these languages to a certain extent by the end of high-school.

So in your particular example you could’ve just answered your addressee in English without being to worried about transferring the message in essence.

November 26, 14h

My website is a mixture of Japanese and English, simply because learning Japanese is my hobby. So, inevitably I get a few messages and comments in Japanese.

For me it’s an opportunity to practice, but for anyone who’s not interested it’s bound to be frustrating!

“Yes. You have my permission to use a screenshot of the site.”

~ はい、その サイト の スクリーンショット を 使用して もよい。

“Hai (yes), sono (the) saito (site) no (of) sukuriinshotto (screenshot) o shiyoushite moyoi (use of).”

And that concludes todays Japanese lesson :)

BenSky says:
December 03, 04h

Thank god my website is in english only, i have enough trouble replying to english emails let alone having to decypher another language!!
Tell your local school that you could provide plenty of assignments for their language classes!

21
Shawn says:
January 31, 10h

I did an experiment with a German lady I’m corresponding with on last.fm who uses translation. I tried a message in English through both Babelfish and Google and she said the Babelfish translation was better. Google does better with Spanish and portuguese though, those are languages I speak. Babelfish thinks amigos in Spanish is friendly, not friends which it actually is. That’s a really stupid mistake and Babelfish is full of those. For a while, if you wrote the word “take” in Babelfish and translated it to Portuguese it would always translate as “fazer exame de” to take a test, no matter the context. Google also handles contractions like aren’t and won’t, Babelfish simply chokes on those, it’s really sad.