Archive for the ‘machine translation’ Category

Controlled language and translation

28/07/2009

Not long time ago Tedopres International BV, a provider of technical documentation services with headquarters in the Netherlands, has launched its Controlled Language website for English (also known as simplified English). The main objective of a controlled language is to make technical text easy to understand. The basic principles of controlled language are a controlled vocabulary and a set of grammatical rules.

Links:

http://www.controlledenglish.com

http://www.simplifiedenglish.net

controlled english

It’s so irritating when automatic translation or alignment of a sentence fails because of formatting issues or grammatical/ terminological inconsistencies. An inappropriate use of a term, the use of the passive instead of the active voice etc. can result in a lower match within our translation memory and the translation itself will cost more in the long run since everything less than a 100% match needs to be translated by a human translator. Not to speak about the possibility of misunderstanding caused by terminological inconsistencies and unclear sentence structure.

And also some words about Statistical Machine Translation (SMT). One of the reasons SMT doesn’t work properly and still can’t pass the ‘Turing test’ for machine translation (if something like this exists at all) is because it can only give good translations for segments previously fed into the system. So even if you’ve got the biggest Translation Memory (TM) with all translated segments ever made in human history, you won’t succeed in translating everything because there will always be NEW sentences, phrases, terms, named entities and words which have never been written and/or translated before. You might get a 95, 90 or 80 % match but not a 100% match, which means: human post-editing required.

One of the ways of optimizing MT & TMS is using a controlled language. According to an article of Uwe Muegge (Controlled language: the next big thing in translation?) the reason why many (even bigger) companies still do not make use of available translation technology is because they don’t understand how translation tools work. I go even further, they don’t understand that in order to be able to recycle you language material, new texts need to be RECYCLABLE!

In order to make this clear, translation agencies should explain their clients the benefits of using controlled language for translation. In order to be able to make optimal use of the advantages of the different translation solutions translation agencies are heavily investing in, it is necessary to make the most of the available language data and control the language already on the level of the source text.

There are various tools for implementing controlled language (even Tedopres offers one) but I think companies could start with using and organize their company-specific terminology systematically. Investing in controlled language you will save on translation costs in the long run.

The next challenge is of course to create a controlled language for other languages than English, German and other common ones. And also, controlled language should sound natural and appealing to people.

On the site of Jeffrey Allen your can find some interesting articles on the role of Controlled Language in (Machine) Translation.

Two things about Google translate

25/06/2009

I’ve just read an article about how poor the Hungarian translations are produced by Google and I ‘ve come to two interesting conclusions.

googletranslateforiphone

1. Google uses English in case of uncommon language combinations 

In my experiment, I google-transated a Dutch article about Robbie Williams into Hungarian and compared it to the English translation of the same article. Checking the translation mistakes, it becomes obvious that Google (not having enough language material for the language combination Dutch-Hungarian) uses English in order to link the two languages. Often, this results in unnecessary mistakes.

For example, the Dutch ‘een shaggie draait’ = ‘roll a cigarette’  becomes ‘a shaggie running’ in English and following this pattern, it becomes ‘a shaggie fut’ in Hungarian where ‘fut’ means ‘run’. So this is an evidence that Google translates the text (or at least part of it) through the English. It’s clever in a way but not very efficient.

2. Secondly, and I’m quite irritated by this, if Google doesn’t have the translation for a word or a term, this word will be used in the translation as it is (see the word ‘shaggie’ in the previous example). For users working in languages with no or few loan words from English, French etc. this won’t cause too many problems. You simply know that Google didn’t find the right translation for this word. But quite often it will be confusing, mostly when translating into languages with lots of loan words (for example Dutch). And when trying to use Google translate to translate scientific texts including a lots of terms, it will be even more confusing (if you type the English ‘diaphragma’ into Google, you get ‘diaphragma’ in Dutch but it should actually be ‘diafragma’ with an ‘f’.)

There are probably a few more things to mention about Google Translate, but instead of going on about the mistakes, I can tell you that it is still slightly better than Bing translator of Microsoft. Slightly better… I guess, we will witness a real “Translation Race” between these two companies in the future. Well, let them do the race and let us harvest the profits!!

————————————————————————-

1. Translation of the Robbie Williams article from Dutch to English

2. Translation of the Robbie Williams article from Dutch to Hungarian