The dream of a terminologist


I’m dreaming of a terminology workbench. A software environment with customizable terminology management work-flow. Such a program should (just like a translator’s workbench for the translation process) automate the different steps of terminology management and information acquisition.

Which components should be included in a (multilingual) Terminology Workbench?

-WebCrawler with filters
-Lemmatizer, POS-tagger, tokeniser, segmentation tool, (for different languages)
-Word- and sentence alignment components
-Translation memory database and editor
-Textanalysis with linguistic components (word concordance, collocation  patterns, etc.) and a statistical component (frequency, T-score, chi-square etc.)
-Terminology extraction (with customizable filters)
-Terminology database
-Web-based and multi-user interface

To my knowledge, no such tool has been introduced yet. (please let me know if I’m wrong). There are some commercial software packages which include some of these modules in separate programs. But there is simply no tool which includes all the components a terminologist needs using one single interface. It’s such a shame because for each task mentioned above, there is an open source or free software available. So it’s just a matter of taking, say, WebReaper, HunAlign, AntConc, Olifant, Xbench ApSic, GlobalSight, Twente word aligner, OmegaT+, etc. and combine these tools into one powerful Terminology Workbench. 

I do know about two interesting initiatives aimed at the development of a Terminology Workbench but these also don’t include all the functionalities I’ve listed above.

One of these initiatives is TerminoWeb, a research project of the National Research Council of Canada. As we can read on the website “The TerminoWeb project focuses on the development of a technology which will allow, as a medium term objective, the automatic construction of specialized ontologies (i.e. ontologies for specific domains), converging in this way with the study of terminology.” (Source: Website of NRCC)

I’ve had the chance to try this software which is still in development. It has an interesting approach to terminology extraction and corpus management. Some more about my findings maybe in another blog.

The other tool is called IHTSDO Global Health Terminology Workbench which is part of the famous SNOMED CT (Systematized Nomenclature of Medicine — Clinical Terms), “a systematically organized computer processable collection of medical terminology covering most areas of clinical information such as diseases, findings, procedures, microorganisms, pharmaceuticals etc.” (Source: Wikipedia). Unfortunately, I haven’t had a chance to try this one. As far as I know, it’s impossible to download a demo or try it online. The SNOMED CT database with concepts and term descriptions is however online accessible. It is quite impressive!

As I said both projects are interesting and have their merits but the “ideal” Terminology Workbench isn’t there yet. So everyone out there, from providers of language software to the open source community, there is still some work to be done!

Controlled language and translation


Not long time ago Tedopres International BV, a provider of technical documentation services with headquarters in the Netherlands, has launched its Controlled Language website for English (also known as simplified English). The main objective of a controlled language is to make technical text easy to understand. The basic principles of controlled language are a controlled vocabulary and a set of grammatical rules.


It’s so irritating when automatic translation or alignment of a sentence fails because of formatting issues or grammatical/ terminological inconsistencies. An inappropriate use of a term, the use of the passive instead of the active voice etc. can result in a lower match within our translation memory and the translation itself will cost more in the long run since everything less than a 100% match needs to be translated by a human translator. Not to speak about the possibility of misunderstanding caused by terminological inconsistencies and unclear sentence structure.

And also some words about Statistical Machine Translation (SMT). One of the reasons SMT doesn’t work properly and still can’t pass the ‘Turing test’ for machine translation (if something like this exists at all) is because it can only give good translations for segments previously fed into the system. So even if you’ve got the biggest Translation Memory (TM) with all translated segments ever made in human history, you won’t succeed in translating everything because there will always be NEW sentences, phrases, terms, named entities and words which have never been written and/or translated before. You might get a 95, 90 or 80 % match but not a 100% match, which means: human post-editing required.

One of the ways of optimizing MT & TMS is using a controlled language. According to an article of Uwe Muegge (Controlled language: the next big thing in translation?) the reason why many (even bigger) companies still do not make use of available translation technology is because they don’t understand how translation tools work. I go even further, they don’t understand that in order to be able to recycle you language material, new texts need to be RECYCLABLE!

In order to make this clear, translation agencies should explain their clients the benefits of using controlled language for translation. In order to be able to make optimal use of the advantages of the different translation solutions translation agencies are heavily investing in, it is necessary to make the most of the available language data and control the language already on the level of the source text.

There are various tools for implementing controlled language (even Tedopres offers one) but I think companies could start with using and organize their company-specific terminology systematically. Investing in controlled language you will save on translation costs in the long run.

The next challenge is of course to create a controlled language for other languages than English, German and other common ones. And also, controlled language should sound natural and appealing to people.

On the site of Jeffrey Allen your can find some interesting articles on the role of Controlled Language in (Machine) Translation.

Terminology extraction


Both translators and translation agencies need to invest time in terminology management one way or another. Translators will usually make use of ad hoc terminology research and sometimes also of systematic terminology management in order to specialize in certain subject fields. Translation agencies, on the other hand, may carry out various projects. These project may involve different translators working side by side on one big document or on different texts originating from the same client. Also in this case, it is necessary to use and manage terminology consistently.

Good terminology management requires efficient and correct terminology extraction (or term extraction) techniques. Term extraction can not only be done after having finished the translation (using bilingual term extraction tools) but also before starting to translate a text. For example, a company terminologist, an employee at a translation agency or a translator can make a term list based on the text and its subject before the actual translation starts. This is useful in order to avoid spending precious time on searching for terms and their equivalents and to avoid terminological inconsistency.

For preparing a translation project and previously providing a term list, one can do a monolingual term extraction using various tools. Unfortunately, automated term extraction both mono- and bilingual rarely yields to satisfying results. The existing term extractors (you can find a short list here or here) are either too expensive or useless… or both!! Besides, most translation agencies and companies don’t have the time and the resources to take care of the tasks of term extraction and terminology management, shifting the responsibility over to the translators (who even have less time and resources).

Luckily, there are some cheap or even free tools which can help translators or companies in analyzing and processing texts and making term lists for major projects semi-automatically. Three of such tools are: Apsic Xbenc, WebCorp and AntConc, but more on this, next time. Have to pack my stuff now,  flying to Canada tomorrow…

The Dutch Network for Terminology


A couple of words on my job. As many of you know, besides working for ExacTerm, I’m also part of the DNT-Team (DNT stands for Dutch Network for Terminology or Steunpunt Nederlandstalige Terminologie in Dutch).


The Dutch Network for Terminology (DNT) was founded in 2007 by the Dutch Language Union, a Dutch-Flemish government institution. The DNT functions as a non-commercial information center for all aspects of terminology and serves the entire Dutch-speaking community. We give advice on terminological research (e.g. on methodology, availability of tools and their use, literature etc.) to anyone who is involved in terminology-related work (companies, organizations, translators, terminologists, teachers, scientists etc.).
On behalf of the Dutch Language Union, the DNT maintains the website NedTerm ( NedTerm provides information on terminology activities (workshops, conferences, etc.) and also includes a bibliography of terminological works, an overview of terminology training courses in the Netherlands and in Flanders, and provides information on standardization issues and on sources for terminology work (e.g. links to term collections freely accessible on the Internet).

 The DNT also organizes terminology trainings and workshops for translators and master classes for scientists and language experts on different subjects relating to terminological research (thesauri, ontologies, semantic web applications, etc).

Besides organizing in-depth study days, we are also offering practical solutions to our targeted group. One example is an Ad hoc terminology course, a methodology I have summarized on this Blog a couple of days ago.

If you want more information on the DNT and our working methods or you are searching for partners for a Terminology project, feel free to contact me.

Terminology Management is hot


Terminology management is becoming more and more popular, at least that’s what some recent reports and surveys confirm.

The UN International Strategy for Disaster Reduction (UNISDR) agency released its 2009 Terminology Report. Its goal is “to promote common understanding and common usage of disaster risk reduction concepts and to assist the disaster risk reduction efforts of authorities, practitioners and the public.”


SDL conducted a survey on terminology management (see my previous blog on this). Its 330 respondents included business and localization professionals predominantly in the IT, software, and manufacturing sectors. According to the results, 95 percent of those taking the survey “recognized the necessity to have the appropriate processes in place to manage their terminology and localization terminology,” but that they often found inconsistencies in the source content. Participants in the survey linked terminology to maintaining brand consistency and increasing productivity.

A survey of the Common Sense Advisory on “How to Avoid Terminology Mismanagement” define the terms used in terminology management, flags the most common technology solution, and enumerates practical requirements for choosing a software solution.A

And finally, another report of the Common Sense Advisory, the Case for Terminology Management, based on interviews with corporate and government terminologists also emphasizes the importance of terminology management in a corporate environment.

(Well, just a footnote: if companies finally realize how important terminology management is for growing and reaching foreign markets as well as making internal and external communication possible, maybe they should also start thinking about SHARING THEIR TERMINOLOGY! More on this maybe another time…)


Survey on Terminology Management by SDL

According to a recent survey by SDL, there is a strong link between Terminology and Branding. Big corporations as well as translation professionals recognize the need for effective terminology management and branding.
SDL, the leading provider of TMS (Translation Management Systems), announced the results of its two surveys exploring trends in terminology and branding; the first completed by business professionals and the second by translators. The results clearly identified a strong link between terminology and brand, highlighting the growing awareness of and need for effective terminology management solutions to maintain a consistent global brand. 
You can read the full research paper on this survey at


Ad hoc terminology


What is ad hoc terminology? In Wikipedia, we find a distinction between Ad hoc terminology  and  Systematic terminology:

“Ad hoc terminology […] deals with a single term or a limited number of terms.

Systematic terminology […] deals with all the terms in a specific subject field or domain of activity”


According to the same definition, ad hoc terminology “is prevalent in the translation profession, where a translation for a specific term (or group of terms) is required quickly to solve a particular translation problem.”


Another resource, the COTSOWES – Recommendations for Terminology Work,  claims similar importance for ad hoc terminology:


Every day translation services have to solve individual terminological problems as quickly as possible. These usually involve terms, neologisms or official expressions which are not in dictionaries or unconfirmed equivalents of terms.





Although both resources underline the importance of ad hoc research in terminology, no elaborated and tested methodology is available yet for translators who deal with terminology on a daily, ad hoc basis. Next time, I will give a short summary of a methodology for ad hoc terminology research worked out at the Dutch Network for Terminology. In the meantime, please wait patiently.

Swansea, march 2004


Once upon a time, there was a conference in Swansea (Wales), with the same title as this blog. Organized by Pius ten Hacken (Swansea) and Willy Martin (Vrije Universiteit Amsterdam), this two-day conference opened up new perspectives for the interdisciplinary science of Terminology.


According to a description of the workshop (to be found on the Swansea University website):

“The topic of terminology has been approached traditionally from the perspective of standardization. More recently, corpus-based approaches have gained prominence. A question which is relevant to both approaches concerns the relationship of terminology to a theory of the lexicon.

In this conference, these perspectives were considered not only theoretically, but also from a practical angle. The study and management of terminology is an essential component of commercial, technical, and scientific translation. Computational tools provide an almost indispensible help to any translator, whether working in an institutional translation service, an independent translation company, or as a free-lance translator.”

A couple of weeks ago sitting in my favourite armchair, sipping a nice cup of coffee, I decided that the spirit and innovation of this conference had to be continued. So here it is a new blog dedicated to

Terminology, Computing and Translation.

Enjoy your reading and don’t hesitate to react.