Archive for August, 2009

The dream of a terminologist


I’m dreaming of a terminology workbench. A software environment with customizable terminology management work-flow. Such a program should (just like a translator’s workbench for the translation process) automate the different steps of terminology management and information acquisition.

terminology workbench

Which components should be included in a (multilingual) Terminology Workbench?

-WebCrawler with filters
-Lemmatizer, POS-tagger, tokeniser, segmentation tool, (for different languages)
-Word- and sentence alignment components
-Translation memory database and editor
-Textanalysis with linguistic components (word concordance, collocation  patterns, etc.) and a statistical component (frequency, T-score, chi-square etc.)
-Terminology extraction (with customizable filters)
-Terminology database
-Web-based and multi-user interface

To my knowledge, no such tool has been introduced yet. (please let me know if I’m wrong). There are some commercial software packages which include some of these modules in separate programs. But there is simply no tool which includes all the components a terminologist needs using one single interface. It’s such a shame because for each task mentioned above, there is an open source or free software available. So it’s just a matter of taking, say, WebReaper, HunAlign, AntConc, Olifant, Xbench ApSic, GlobalSight, Twente word aligner, OmegaT+, etc. and combine these tools into one powerful Terminology Workbench. 

I do know about two interesting initiatives aimed at the development of a Terminology Workbench but these also don’t include all the functionalities I’ve listed above.

One of these initiatives is TerminoWeb, a research project of the National Research Council of Canada. As we can read on the website “The TerminoWeb project focuses on the development of a technology which will allow, as a medium term objective, the automatic construction of specialized ontologies (i.e. ontologies for specific domains), converging in this way with the study of terminology.” (Source: Website of NRCC)

I’ve had the chance to try this software which is still in development. It has an interesting approach to terminology extraction and corpus management. Some more about my findings maybe in another blog.

The other tool is called IHTSDO Global Health Terminology Workbench which is part of the famous SNOMED CT (Systematized Nomenclature of Medicine — Clinical Terms), “a systematically organized computer processable collection of medical terminology covering most areas of clinical information such as diseases, findings, procedures, microorganisms, pharmaceuticals etc.” (Source: Wikipedia). Unfortunately, I haven’t had a chance to try this one. As far as I know, it’s impossible to download a demo or try it online. The SNOMED CT database with concepts and term descriptions is however online accessible. It is quite impressive!

As I said both projects are interesting and have their merits but the “ideal” Terminology Workbench isn’t there yet. So everyone out there, from providers of language software to the open source community, there is still some work to be done!

language and meaning


“…meaning is not ‘in’ language; rather, language is like recipe for constructing meaning, a recipe which relies on a lot of independent cognitive activity.” (John I. Saeed: Semantics p.319)