Phrase-Based Statistical Machine Translation Using Approximate Matching

Phrase-based statistical models constitute one of the most competitive pattern-recognition approaches to machine translation. In this case, the source sentence is fragmented into phrases, then, each phrase is translated by using a stochastic dictionary. One shortcoming of this phrase-based model is that it does not have an adequate generalization capability. If a sequence of words has not been seen in training, it cannot be translated as a whole phrase. In this paper we try to overcome this drawback. The basic idea is that if a source phrase is not in our dictionary (has not been seen in training), we look for the most similar in our dictionary and try to adapt its translation to the source phrase. We are using the well known edit distance as a measure of similarity. We present results from an English-Spanish task (XRCE).

[1]  Marti A. Hearst,et al.  HLT-NAACL 2003 : Human Language Technology conference of the North American Chapter of the Association for Computational Linguistics : proceedings of the main conference : May 27 to June 1, 2003, Edmonton, Alberta, Canada , 2003 .

[2]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[3]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[5]  Hermann Ney,et al.  Bootstrap estimates for confidence intervals in ASR performance evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[7]  Hermann Ney,et al.  Phrase-Based Statistical Machine Translation , 2002, KI.

[8]  Gerhard Lakemeyer,et al.  KI 2002: Advances in Artificial Intelligence , 2002, Lecture Notes in Computer Science.

[9]  Paolo Tiberio,et al.  Searching Similar (Sub)Sentences for Example-Based Machine Translation , 2002, SEBD.

[10]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Francisco Casacuberta,et al.  MONOTONE STATISTICAL TRANSLATION USING WORD GROUPS , 2001 .

[12]  Francisco Casacuberta,et al.  Combining Phrase-Based and Template-Based Alignment Models in Statistical Translation , 2003, IbPRIA.

[13]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[14]  Graham A Stephen,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[15]  H. Ney,et al.  A novel string-to-string distance measure with applications to machine translation evaluation , 2003, MTSUMMIT.

[16]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .