Terminology symposium and workshop at Euralex 2010

27/04/2010

The Dutch Terminology Service Centre and the Dutch Association for Terminology will organize a symposium and a workshop on terminology during Euralex 2010 in Leeuwaarden. You can read more here:

http://taalunieversum.org/taal/terminologie/programme_symposium/index.php

http://www.euralex.nl/

The dream of a terminologist

28/08/2009

I’m dreaming of a terminology workbench. A software environment with customizable terminology management work-flow. Such a program should (just like a translator’s workbench for the translation process) automate the different steps of terminology management and information acquisition.

terminology workbench

Which components should be included in a (multilingual) Terminology Workbench?

-WebCrawler with filters
-Lemmatizer, POS-tagger, tokeniser, segmentation tool, (for different languages)
-Word- and sentence alignment components
-Translation memory database and editor
-Textanalysis with linguistic components (word concordance, collocation  patterns, etc.) and a statistical component (frequency, T-score, chi-square etc.)
-Terminology extraction (with customizable filters)
-Terminology database
-Web-based and multi-user interface

To my knowledge, no such tool has been introduced yet. (please let me know if I’m wrong). There are some commercial software packages which include some of these modules in separate programs. But there is simply no tool which includes all the components a terminologist needs using one single interface. It’s such a shame because for each task mentioned above, there is an open source or free software available. So it’s just a matter of taking, say, WebReaper, HunAlign, AntConc, Olifant, Xbench ApSic, GlobalSight, Twente word aligner, OmegaT+, etc. and combine these tools into one powerful Terminology Workbench. 

I do know about two interesting initiatives aimed at the development of a Terminology Workbench but these also don’t include all the functionalities I’ve listed above.

One of these initiatives is TerminoWeb, a research project of the National Research Council of Canada. As we can read on the website “The TerminoWeb project focuses on the development of a technology which will allow, as a medium term objective, the automatic construction of specialized ontologies (i.e. ontologies for specific domains), converging in this way with the study of terminology.” (Source: Website of NRCC)

I’ve had the chance to try this software which is still in development. It has an interesting approach to terminology extraction and corpus management. Some more about my findings maybe in another blog.

The other tool is called IHTSDO Global Health Terminology Workbench which is part of the famous SNOMED CT (Systematized Nomenclature of Medicine — Clinical Terms), “a systematically organized computer processable collection of medical terminology covering most areas of clinical information such as diseases, findings, procedures, microorganisms, pharmaceuticals etc.” (Source: Wikipedia). Unfortunately, I haven’t had a chance to try this one. As far as I know, it’s impossible to download a demo or try it online. The SNOMED CT database with concepts and term descriptions is however online accessible. It is quite impressive!

As I said both projects are interesting and have their merits but the “ideal” Terminology Workbench isn’t there yet. So everyone out there, from providers of language software to the open source community, there is still some work to be done!

language and meaning

28/08/2009

“…meaning is not ‘in’ language; rather, language is like recipe for constructing meaning, a recipe which relies on a lot of independent cognitive activity.” (John I. Saeed: Semantics p.319)

Controlled language and translation

28/07/2009

Not long time ago Tedopres International BV, a provider of technical documentation services with headquarters in the Netherlands, has launched its Controlled Language website for English (also known as simplified English). The main objective of a controlled language is to make technical text easy to understand. The basic principles of controlled language are a controlled vocabulary and a set of grammatical rules.

Links:

http://www.controlledenglish.com

http://www.simplifiedenglish.net

controlled english

It’s so irritating when automatic translation or alignment of a sentence fails because of formatting issues or grammatical/ terminological inconsistencies. An inappropriate use of a term, the use of the passive instead of the active voice etc. can result in a lower match within our translation memory and the translation itself will cost more in the long run since everything less than a 100% match needs to be translated by a human translator. Not to speak about the possibility of misunderstanding caused by terminological inconsistencies and unclear sentence structure.

And also some words about Statistical Machine Translation (SMT). One of the reasons SMT doesn’t work properly and still can’t pass the ‘Turing test’ for machine translation (if something like this exists at all) is because it can only give good translations for segments previously fed into the system. So even if you’ve got the biggest Translation Memory (TM) with all translated segments ever made in human history, you won’t succeed in translating everything because there will always be NEW sentences, phrases, terms, named entities and words which have never been written and/or translated before. You might get a 95, 90 or 80 % match but not a 100% match, which means: human post-editing required.

One of the ways of optimizing MT & TMS is using a controlled language. According to an article of Uwe Muegge (Controlled language: the next big thing in translation?) the reason why many (even bigger) companies still do not make use of available translation technology is because they don’t understand how translation tools work. I go even further, they don’t understand that in order to be able to recycle you language material, new texts need to be RECYCLABLE!

In order to make this clear, translation agencies should explain their clients the benefits of using controlled language for translation. In order to be able to make optimal use of the advantages of the different translation solutions translation agencies are heavily investing in, it is necessary to make the most of the available language data and control the language already on the level of the source text.

There are various tools for implementing controlled language (even Tedopres offers one) but I think companies could start with using and organize their company-specific terminology systematically. Investing in controlled language you will save on translation costs in the long run.

The next challenge is of course to create a controlled language for other languages than English, German and other common ones. And also, controlled language should sound natural and appealing to people.

On the site of Jeffrey Allen your can find some interesting articles on the role of Controlled Language in (Machine) Translation.

Sfep – annual conference

17/07/2009

sfep2

Just a day or so after I had published my blog on “the death of the reviewer” I found this:

SfEP AGM and 20th annual conference
Editing in the 21st century
Vanbrugh College, University of York
Sunday 13 to Tuesday 15 September 2009

Here is also the link for more info.

I don’t know if Renato Beninatto is going to join this event but I think this is a great occasion to present his PCTP translation quality concept (check here) and its implication for the profession of the editor/proofreader. And also a chance for many feedbacks on Renato’s newly proposed translation workflow from the side of ‘the real players’ in the field.

The death of the reviewer?

10/07/2009

Do translation agencies still need reviewers in the age of high-tech spell-checkers, term validation software, terminology management tools, translator’s workbench, on-line crowd-sourcing, translation forums, etc…etc? Or can we expect translators to deliver high-quality translations without the necessity of further improvements by a reviewer?

Typewriter

In an interview with Renato Beninatto, known for his statement ”Quality doesn’t matter“, we can read the following:

<<What I propose is that you eliminate the editor. You have a Project Manager whose function changes from manager to “facilitator”. And you create a “community” (a discussion list, a portal, it can take any shape that you want), where you have the translators, a consultant, an expert on the topic whose job is to answer questions about the topic in the corresponding language.>>

It is an interesting thought to shift the task of the reviewer to the project manager, to experts and to other freelance translators. The quality of the translations for sure wouldn’t become less using this new workflow. In my opinion, it would even improve since an expert has a thorougher knowledge about the terminology and the conceptual structure of a given subject field than a reviewer (who has only a superficial knowledge of most of the domains). On the other hand, if you are monitored by a pool of translators in a form of a translation forum as suggested above, the chances of solving all the linguistic problems of the translation are bigger than when only one person, the reviewer reads through and corrects your text. So in a way, a pool of experts and fellow translators combined with a project manager who is able to manage the transfer of information among all the participators well, these together would form a perfect translation team.

I think for large projects, at large companies, the scenario mentioned above is a possibility, for smaller agencies with much less profit, this working method would be a bit harder (but not impossible) to implement. With the open source software available nowadays, it’s a piece of cake to set up a forum, exploit on-line term bases and knowledge banks and make use of translation software and TMS.

So I’m quite positive about this new way of structuring the translation process with new tasks and roles. And of course, it’s also worth reading that interview with Renato Beninatto. You can find it on the blog called ‘Lapsus Translinguae’, by the way a very interesting blog.

Two things about Google translate

25/06/2009

I’ve just read an article about how poor the Hungarian translations are produced by Google and I ‘ve come to two interesting conclusions.

googletranslateforiphone

1. Google uses English in case of uncommon language combinations 

In my experiment, I google-transated a Dutch article about Robbie Williams into Hungarian and compared it to the English translation of the same article. Checking the translation mistakes, it becomes obvious that Google (not having enough language material for the language combination Dutch-Hungarian) uses English in order to link the two languages. Often, this results in unnecessary mistakes.

For example, the Dutch ‘een shaggie draait’ = ‘roll a cigarette’  becomes ’a shaggie running’ in English and following this pattern, it becomes ‘a shaggie fut’ in Hungarian where ’fut’ means ’run’. So this is an evidence that Google translates the text (or at least part of it) through the English. It’s clever in a way but not very efficient.

2. Secondly, and I’m quite irritated by this, if Google doesn’t have the translation for a word or a term, this word will be used in the translation as it is (see the word ‘shaggie’ in the previous example). For users working in languages with no or few loan words from English, French etc. this won’t cause too many problems. You simply know that Google didn’t find the right translation for this word. But quite often it will be confusing, mostly when translating into languages with lots of loan words (for example Dutch). And when trying to use Google translate to translate scientific texts including a lots of terms, it will be even more confusing (if you type the English ‘diaphragma’ into Google, you get ‘diaphragma’ in Dutch but it should actually be ’diafragma’ with an ‘f’.)

There are probably a few more things to mention about Google Translate, but instead of going on about the mistakes, I can tell you that it is still slightly better than Bing translator of Microsoft. Slightly better… I guess, we will witness a real “Translation Race” between these two companies in the future. Well, let them do the race and let us harvest the profits!!

————————————————————————-

1. Translation of the Robbie Williams article from Dutch to English

2. Translation of the Robbie Williams article from Dutch to Hungarian

Linguee, the future favorite of translators

19/06/2009

Linguee is a very large corpus of web-based translated materials from live online sources. The data is displayed in-context together with links to the originating sites.

With Linguee, you can search through many millions of bilingual texts in English and German for words and expressions. Every expression is accompanied by useful additional information and suitable example sentences.

LingueeThough only German and English are covered at this point, I think we can expect other languages in the nearby future. The tool is certainly very usefull for English <> German translators since it yields impressive search results. One very important feature is that the source of the words, their translations as well as the contexts can be easily traced. An advantage to Google Translate or TAUS in which the sources remain vague or unknown.

Just to see, I have done a little experiment with the TAUS language portal and Linguee. I searched for the German translation of the word “council”. It was quite interesting to see how much more results are in Linguee than in the TAUS language portal (dozens of examples vs only 3). I find the result page of Linguee also much richer than that of TAUS. 

Try it yourself!

(Click on this link to see my search result for “Council” in Linguee: http://www.linguee.com/search?query=council)

Terminology extraction

11/05/2009

Both translators and translation agencies need to invest time in terminology management one way or another. Translators will usually make use of ad hoc terminology research and sometimes also of systematic terminology management in order to specialize in certain subject fields. Translation agencies, on the other hand, may carry out various projects. These project may involve different translators working side by side on one big document or on different texts originating from the same client. Also in this case, it is necessary to use and manage terminology consistently.

Term Extractor

Good terminology management requires efficient and correct terminology extraction (or term extraction) techniques. Term extraction can not only be done after having finished the translation (using bilingual term extraction tools) but also before starting to translate a text. For example, a company terminologist, an employee at a translation agency or a translator can make a term list based on the text and its subject before the actual translation starts. This is useful in order to avoid spending precious time on searching for terms and their equivalents and to avoid terminological inconsistency.

For preparing a translation project and previously providing a term list, one can do a monolingual term extraction using various tools. Unfortunately, automated term extraction both mono- and bilingual rarely yields to satisfying results. The existing term extractors (you can find a short list here or here) are either too expensive or useless… or both!! Besides, most translation agencies and companies don’t have the time and the resources to take care of the tasks of term extraction and terminology management, shifting the responsibility over to the translators (who even have less time and resources).

Luckily, there are some cheap or even free tools which can help translators or companies in analyzing and processing texts and making term lists for major projects semi-automatically. Three of such tools are: Apsic Xbenc, WebCorp and AntConc, but more on this, next time. Have to pack my stuff now,  flying to Canada tomorrow…

TMS SaaS

05/05/2009

To my opinion, TMS (Translation Management System) on a SaaS (Software as a Service) basis is and will be the ultimate solution for small and mid-sized translation agencies trying to remain competitive on the translation marketplace. SaaS provides companies with the option to lease, rather than purchase software with an indefinite number of  licences. It enables remote access (on-line, web-based access) to software which is installed on the provider’s server.

ExacTerm, for instance, offers an on-line TMS for translation agencies at a very low monthly fee. In this TMS, you can store your translation memories, terminology collections and automate your translation process. Sounds nice, but is it safe? How can you make sure that no one else can get to my texts and data? Can you guarantee safety when texts and data are transferred through the web?

saas

Generally speaking, data stored on the provider’s server is more secure than that stored on company servers or PCs. the reason is simply because not many small businesses can afford fully secure, anti-hack and anti-virus systems with backup facilities, emergency power supply and alternative Internet access support. In this respect, SaaS providers are no doubt stronger and better equipped.

Apart from that, there are also other reasons why translation agencies should opt for TMS SaaS.  First of all, the costs of software implementation, set-up and update/ upgrade services are usually included in the overall monthly fees. Secondly, instead of paying per licence, a fee is charged on a monthly basis, making the service much cheaper than using a commercial TMS. Furthermore, the system is flexible enough to be shaped according to the special wishes of the client. And finally, troubleshooting and other related ICT services are done buy the provider, so you save money on those costs as well.

The SaaS model is a perfect solution for translation- and terminology management. SaaS allows small businesses to grow and develop without making huge investments on commercial TMS’s.


Follow

Get every new post delivered to your Inbox.