Building Domain Specific Bilingual Dictionaries

This paper proposes a method to build bilingual dictionaries for specific domains defined by a parallel corpora. The proposed method is based on an original method that is not domain specific. Both the original and the proposed methods are constructed with previously available natural language processing tools. Therefore, this paper contribution resides in the choice and parametrization of the chosen tools. To illustrate the proposed method benefits we conduct an experiment over technical manuals in English and Portuguese. The results of our proposed method were analyzed by human specialists and our results indicates significant increases in precision for unigrams and muli-grams. Numerically, the precision increase is as big as 15% according to our evaluation.

[1]  Béatrice Daille,et al.  TTC TermSuite - A UIMA Application for Multilingual Terminology Extraction from Comparable Corpora , 2011, IJCNLP.

[2]  Robert C. Moore Fast and accurate sentence alignment of bilingual corpora , 2002, AMTA.

[3]  Renata Vieira,et al.  Domain term relevance through tf-dcf , 2012 .

[4]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[5]  Renata Vieira,et al.  EχATOLP – An Automatic Tool for Term Extraction from Portuguese Language Corpora , 2009 .

[6]  Helena de Medeiros Caseli Indução de léxicos bilíngües e regras para a tradução automática , 2007 .

[7]  Jörg Tiedemann Recycling Translations : Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing , 2003 .

[8]  Chengzhi Zhang Extracting Chinese-English Bilingual Core Terminology from Parallel Classified Corpora in Special Domain , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[9]  Rafael Prikladnicki,et al.  Extração de Vocabulário Multilíngue a Partir de Documentação de Software , 2012, ONTOBRAS-MOST.

[10]  Gabriela Fernandez,et al.  Mutual Bilingual Terminology Extraction , 2008, LREC.

[11]  Lucelene Lopes Extração automática de conceitos a partir de textos em língua portuguesa , 2012 .

[12]  Francis M. Tyers,et al.  Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[13]  Daniel Martins,et al.  Extracting compound terms from domain corpora , 2010, Journal of the Brazilian Computer Society.