Building Bilingual Lexicons using Lexical Translation Probabilities via Pivot Languages

This paper proposes a method of increasing the size of a bilingual lexicon obtained from two other bilingual lexicons via a pivot language. When we apply this approach, there are two main challenges, “ambiguity” and “mismatch” of terms; we target the latter problem by improving the utilization ratio of the bilingual lexicons. Given two bilingual lexicons between language pairs Lf-Lp and Lp-Le, we compute lexical translation probabilities of word pairs by using a statistical word-alignment model, and term decomposition/composition techniques. We compare three approaches to generate the bilingual lexicon: “exact merging”, “word-based merging”, and our proposed “alignment-based merging”. In our method, we combine lexical translation probabilities and a simple language model for estimating the probabilities of translation pairs. The experimental results show that our method could drastically improve the number of translation terms compared to the two methods mentioned above. Additionally, we evaluated and discussed the quality of the translation outputs.

[1]  Kentaro Ogura,et al.  Design and construction of a machine-tractable Japanese-Malay dictionary , 2001 .

[2]  Hitoshi Isahara,et al.  Construction of a Japanese-Chinese Bilingual Dictionary Using English as an Intermediary , 2005, Int. J. Comput. Process. Orient. Lang..

[3]  Jian-Yun Nie,et al.  Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web , 1999, SIGIR '99.

[4]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[5]  Hitoshi Isahara,et al.  A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation , 2007, NAACL.

[6]  Tetsuji Nakagawa,et al.  A Hybrid Approach to Word Segmentation and POS Tagging , 2007, ACL.

[7]  David Yarowsky,et al.  Inducing Translation Lexicons via Diverse Similarity Measures and Bridge Languages , 2002, CoNLL.

[8]  Yuji Matsumoto,et al.  Building a Japanese-Chinese Dictionary Using Kanji/Hanzi Conversion , 2005, IJCNLP.

[9]  Kumiko Tanaka-Ishii,et al.  Construction of a Bilingual Dictionary Intermediated by a Third Language , 1994, COLING.

[10]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[11]  Jun'ichi Tsujii,et al.  Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data , 2005, HLT.

[12]  Hua Wu,et al.  Pivot language approach for phrase-based statistical machine translation , 2007, ACL.

[13]  Francis Bond,et al.  Using multiple pivots to align Korean and Japanese lexical resources , 2001 .

[14]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[15]  Satoshi Shirai,et al.  Linking English Words in Two Bilingual Dictionaries to Generate Another Language Pair Dictionary , 2001 .