The dream of a terminologist


I’m dreaming of a terminology workbench. A software environment with customizable terminology management work-flow. Such a program should (just like a translator’s workbench for the translation process) automate the different steps of terminology management and information acquisition.

terminology workbench

Which components should be included in a (multilingual) Terminology Workbench?

-WebCrawler with filters
-Lemmatizer, POS-tagger, tokeniser, segmentation tool, (for different languages)
-Word- and sentence alignment components
-Translation memory database and editor
-Textanalysis with linguistic components (word concordance, collocation  patterns, etc.) and a statistical component (frequency, T-score, chi-square etc.)
-Terminology extraction (with customizable filters)
-Terminology database
-Web-based and multi-user interface

To my knowledge, no such tool has been introduced yet. (please let me know if I’m wrong). There are some commercial software packages which include some of these modules in separate programs. But there is simply no tool which includes all the components a terminologist needs using one single interface. It’s such a shame because for each task mentioned above, there is an open source or free software available. So it’s just a matter of taking, say, WebReaper, HunAlign, AntConc, Olifant, Xbench ApSic, GlobalSight, Twente word aligner, OmegaT+, etc. and combine these tools into one powerful Terminology Workbench. 

I do know about two interesting initiatives aimed at the development of a Terminology Workbench but these also don’t include all the functionalities I’ve listed above.

One of these initiatives is TerminoWeb, a research project of the National Research Council of Canada. As we can read on the website “The TerminoWeb project focuses on the development of a technology which will allow, as a medium term objective, the automatic construction of specialized ontologies (i.e. ontologies for specific domains), converging in this way with the study of terminology.” (Source: Website of NRCC)

I’ve had the chance to try this software which is still in development. It has an interesting approach to terminology extraction and corpus management. Some more about my findings maybe in another blog.

The other tool is called IHTSDO Global Health Terminology Workbench which is part of the famous SNOMED CT (Systematized Nomenclature of Medicine — Clinical Terms), “a systematically organized computer processable collection of medical terminology covering most areas of clinical information such as diseases, findings, procedures, microorganisms, pharmaceuticals etc.” (Source: Wikipedia). Unfortunately, I haven’t had a chance to try this one. As far as I know, it’s impossible to download a demo or try it online. The SNOMED CT database with concepts and term descriptions is however online accessible. It is quite impressive!

As I said both projects are interesting and have their merits but the “ideal” Terminology Workbench isn’t there yet. So everyone out there, from providers of language software to the open source community, there is still some work to be done!

Terminology extraction


Both translators and translation agencies need to invest time in terminology management one way or another. Translators will usually make use of ad hoc terminology research and sometimes also of systematic terminology management in order to specialize in certain subject fields. Translation agencies, on the other hand, may carry out various projects. These project may involve different translators working side by side on one big document or on different texts originating from the same client. Also in this case, it is necessary to use and manage terminology consistently.

Term Extractor

Good terminology management requires efficient and correct terminology extraction (or term extraction) techniques. Term extraction can not only be done after having finished the translation (using bilingual term extraction tools) but also before starting to translate a text. For example, a company terminologist, an employee at a translation agency or a translator can make a term list based on the text and its subject before the actual translation starts. This is useful in order to avoid spending precious time on searching for terms and their equivalents and to avoid terminological inconsistency.

For preparing a translation project and previously providing a term list, one can do a monolingual term extraction using various tools. Unfortunately, automated term extraction both mono- and bilingual rarely yields to satisfying results. The existing term extractors (you can find a short list here or here) are either too expensive or useless… or both!! Besides, most translation agencies and companies don’t have the time and the resources to take care of the tasks of term extraction and terminology management, shifting the responsibility over to the translators (who even have less time and resources).

Luckily, there are some cheap or even free tools which can help translators or companies in analyzing and processing texts and making term lists for major projects semi-automatically. Three of such tools are: Apsic Xbenc, WebCorp and AntConc, but more on this, next time. Have to pack my stuff now,  flying to Canada tomorrow…

Terminology Management is hot


Terminology management is becoming more and more popular, at least that’s what some recent reports and surveys confirm.

The UN International Strategy for Disaster Reduction (UNISDR) agency released its 2009 Terminology Report. Its goal is “to promote common understanding and common usage of disaster risk reduction concepts and to assist the disaster risk reduction efforts of authorities, practitioners and the public.”


SDL conducted a survey on terminology management (see my previous blog on this). Its 330 respondents included business and localization professionals predominantly in the IT, software, and manufacturing sectors. According to the results, 95 percent of those taking the survey “recognized the necessity to have the appropriate processes in place to manage their terminology and localization terminology,” but that they often found inconsistencies in the source content. Participants in the survey linked terminology to maintaining brand consistency and increasing productivity.

A survey of the Common Sense Advisory on “How to Avoid Terminology Mismanagement” define the terms used in terminology management, flags the most common technology solution, and enumerates practical requirements for choosing a software solution.A

And finally, another report of the Common Sense Advisory, the Case for Terminology Management, based on interviews with corporate and government terminologists also emphasizes the importance of terminology management in a corporate environment.

(Well, just a footnote: if companies finally realize how important terminology management is for growing and reaching foreign markets as well as making internal and external communication possible, maybe they should also start thinking about SHARING THEIR TERMINOLOGY! More on this maybe another time…)