Collocation Translation Acquisition Using Monolingual Corpora

Collocation translation is important for machine translation and many other NLP tasks. Unlike previous methods using bilingual parallel corpora, this paper presents a new method for acquiring collocation translations by making use of monolingual corpora and linguistic knowledge. First, dependency triples are extracted from Chinese and English corpora with dependency parsers. Then, a dependency triple translation model is estimated using the EM algorithm based on a dependency correspondence assumption. The generated triple translation model is used to extract collocation translations from two monolingual corpora. Experiments show that our approach outperforms the existing monolingual corpus based methods in dependency triple translation and achieves promising results in collocation translation extraction.

[1]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[2]  J. Daciuk The 18th International Conference on Computational Linguistics , 2000 .

[3]  EstimationPeter,et al.  The Mathematics of Machine Translation : Parameter , 2004 .

[4]  Hiroshi Echizen-ya Effectiveness of Automatic Extraction of Bilingual Collocations Using Recursive Chain-link-type Learning , 2003 .

[5]  M. Benson,et al.  Collocations and General-purpose Dictionaries , 1990 .

[6]  Hang Li,et al.  Base Noun Phrase Translation Using Web Data and the EM Algorithm , 2002, COLING.

[7]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[8]  Nikos Fakotakis,et al.  Comparative Evaluation of Collocation Extraction Metrics , 2002, LREC.

[9]  Eric Wehrli,et al.  Extraction of multi-word collocations using syntactic bigram composition , 2003 .

[10]  I. Dan Melamed Automatic Discovery of Non-Compositional Compounds in Parallel Data , 1997, EMNLP.

[11]  Ming Zhou,et al.  Synonymous Collocation Extraction Using Translation Information , 2003, ACL.

[12]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[13]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[14]  M. Rey Learning a Translation Lexicon from Monolingual Corpora , 2002 .

[15]  Alon Itai,et al.  Word Sense Disambiguation Using a Second Language Monolingual Corpus , 1994, CL.

[16]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[17]  Philipp Koehn,et al.  Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm , 2000, AAAI/IAAI.

[18]  Yuji Matsumoto,et al.  Acquisition of Phrase-level Bilingual Correspondence using Dependency Structure , 2000, COLING.

[19]  Dekang Lin,et al.  Automatic Identification of Non-compositional Phrases , 1999, ACL.

[20]  Ding Yuan,et al.  Improving Translation Selection with a New Translation Model Trained by Independent Monolingual Corpora , 2001, Int. J. Comput. Linguistics Chin. Lang. Process..

[21]  Jianfeng Gao,et al.  Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations , 2002, SIGIR '02.

[22]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[23]  Pascale Fung,et al.  An IR Approach for Translating New Words from Nonparallel, Comparable Texts , 1998, ACL.

[24]  Hang Li,et al.  Word Translation Disambiguation Using Bilingual Bootstrapping , 2002, ACL.

[25]  Julian Kupiec,et al.  An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora , 1993, ACL.