Adaptation of the translation model for statistical machine translation based on information retrieval

In this paper we present experiments concerning translation model adaptation for statistical machine translation. We develop a method to adapt translation models using in- formation retrieval. The approach selects sentences similar to the test set to form an adapted training corpus. The method allows a better use of additionally available out-of-domain training data or finds in-domain data in a mixed corpus. The adapted translation models significantly improve the translation performance compared to competitive baseline sys- tems.

[1]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[2]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[3]  Y. Zhang,et al.  Integrated phrase segmentation and alignment algorithm for statistical machine translation , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[4]  Hua Wu,et al.  Improving Domain-Specific Word Alignment for Computer Assisted Translation , 2004, ACL.

[5]  Alexander H. Waibel,et al.  Language Model Adaptation for Statistical Machine Translation Based on Information Retrieval , 2004, LREC.

[6]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[7]  Stephan Vogel,et al.  Language Model Adaptation for Statistical Machine Translation via Structured Query Models , 2004, COLING.

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  Wu Hua,et al.  Improving domain-specific word alignment for computer assisted translation , 2004, ACL 2004.

[10]  Eiichiro Sumita,et al.  Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World , 2002, LREC.

[11]  Sanjeev Khudanpur,et al.  Language model adaptation using cross-lingual information , 2003, INTERSPEECH.

[12]  Xuedong Huang,et al.  Improved topic-dependent language modeling using information retrieval techniques , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).