Compiling French-Japanese Terminologies from the Web

We propose a method for compiling bilingual terminologies of multi-word terms (MWTs) for given translation pairs of seed terms. Traditional methods for bilingual terminology compilation exploit parallel texts, while the more recent ones have focused on comparable corpora. We use bilingual corpora collected from the web and tailor made for the seed terms. For each language, we extract from the corpus a set of MWTs pertaining to the seed’s semantic domain, and use a compositional method to align MWTs from both sets. We increase the coverage of our system by using thesauri and by applying a bootstrap method. Experimental results show high precision and indicate promising prospects for future developments.

[1]  Satoshi Sato,et al.  Automatic Collection of Related Terms from the Web , 2003, ACL.

[2]  Pascale Fung,et al.  A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora , 1998, AMTA.

[3]  Pierre Zweigenbaum,et al.  Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora , 2002, COLING.

[4]  Fatiha Sadat,et al.  An Approach Based on Multilingual Thesauri and Model Combination for Bilingual Lexicon Extraction , 2002, COLING.

[5]  Éric Gaussier,et al.  Towards Automatic Extraction of Monolingual and Bilingual Terminology , 1994, COLING.

[6]  Sophia Ananiadou,et al.  The C-value/NC-value domain-independent method for multi-word term extraction , 1999 .

[7]  Timothy Baldwin,et al.  Translation by Machine of Complex Nominals: Getting it Right , 2004 .

[8]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[9]  Emmanuel Morin,et al.  French-English Terminology Extraction from Comparable Corpora , 2005, IJCNLP.

[10]  Timothy Baldwin,et al.  Noun-Noun Compound Machine Translation A Feasibility Study on Shallow Processing , 2003, Proceedings of the ACL 2003 workshop on Multiword expressions analysis, acquisition and treatment -.

[11]  Hang Li,et al.  Base Noun Phrase Translation Using Web Data and the EM Algorithm , 2002, COLING.

[12]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.