Translating Collocation using Monolingual and Parallel Corpus

In this paper, we propose a method for translating a given verb-noun collocation based on a parallel corpus and an additional monolingual corpus. Our approach involves two models to generate collocation translations. The combination translation model generates combined translations of the collocate and the base word, and filters translations by a target language model from a monolingual corpus, and the bidirectional alignment translation model generates translations using bidirectional alignment information. At run time, each model generates a list of possible translation candidates, and translations in two candidate lists are re-ranked and returned as our system output. We describe the implementation of using method using Hong Kong Parallel Text. The experiment results show that our method improves the quality of top-ranked collocation translations, which could be used to assist ESL learners and bilingual dictionaries editors. Keyword: collocation, statistical machine translation, computer-assisted translation

[1]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[2]  Pascale Fung,et al.  A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups , 2004, Machine Translation.

[3]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[4]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[5]  Ming Zhou,et al.  Collocation Translation Acquisition Using Monolingual Corpora , 2004, ACL.

[6]  Daniel Marcu,et al.  A Phrase-Based, Joint Probability for Statistical Machine Translation , 2002 .

[7]  Eric Wehrli,et al.  Collocation translation based on sentence alignment and parsing , 2007, JEPTALNRECITAL.

[8]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[9]  Julian Kupiec,et al.  An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora , 1993, ACL.

[10]  Philipp Koehn,et al.  Feature-Rich Statistical Translation of Noun Phrases , 2003, ACL.

[11]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[12]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[13]  Hang Li,et al.  Base Noun Phrase Translation Using Web Data and the EM Algorithm , 2002, COLING.

[14]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[15]  Kenneth Ward Church,et al.  Termight: Identifying and Translating Technical Terminology , 1994, ANLP.

[16]  Ding Yuan,et al.  Improving Translation Selection with a New Translation Model Trained by Independent Monolingual Corpora , 2001, Int. J. Comput. Linguistics Chin. Lang. Process..

[17]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  Masanobu Higashida,et al.  Extracting bilingual collocations from non-aligned parallel corpora , 1999, TMI.

[20]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.