Example-based machine translation based on tree–string correspondence and statistical generation

This paper describes an example-based machine translation (EBMT) method based on tree–string correspondence (TSC) and statistical generation. In this method, the translation example is represented as a TSC, which is a triple consisting of a parse tree in the source language, a string in the target language, and the correspondence between the leaf node of the source-language tree and the substring of the target-language string. For an input sentence to be translated, it is first parsed into a tree. Then the TSC forest which best matches the input tree is searched for. Finally the translation is generated using a statistical generation model to combine the target-language strings of the TSCs. The generation model consists of three features: the semantic similarity between the tree in the TSC and the input tree, the translation probability of translating the source word into the target word, and the language-model probability for the target-language string. Based on the above method, we build an English-to-Chinese MT system. Experimental results indicate that the performance of our system is comparable with phrase-based statistical MT systems.

[1]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[2]  Arjen Poutsma Data-Oriented Translation , 2000, COLING.

[3]  Takehito Utsuro,et al.  Thesaurus-based Efficient Example Retrieval by Generating Retrieval Queries from Similarities , 1994, COLING.

[4]  Taro Watanabe,et al.  Example-based Machine Translation Based on Syntactic Transfer with Statistical Models , 2004, COLING.

[5]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6]  Harold L. Somers,et al.  Review Article: Example-based Machine Translation , 1999, Machine Translation.

[7]  Sadao Kurohashi,et al.  Example-based machine translation using structural translation examples , 2004, IWSLT.

[8]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[9]  Hideo Watanabe,et al.  A model of a bi-directional transfer mechanism using rule combinations , 1995, Machine Translation.

[10]  Kevin Knight,et al.  A Decoder for Syntax-based Statistical MT , 2002, ACL.

[11]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[12]  Philippe Langlais,et al.  EBMT by tree-phrasing , 2006, Machine Translation.

[13]  EstimationPeter,et al.  The Mathematics of Machine Translation : Parameter , 2004 .

[14]  Jude W. Shavlik,et al.  Machine Learning: Proceedings of the Fifteenth International Conference , 1998 .

[15]  Taro Watanabe,et al.  Using Language and Translation Models to Select the Best among Outputs from Multiple MT Systems , 2002, COLING.

[16]  Ulrich Germann,et al.  Greedy Decoding for Statistical Machine Translation in Almost Linear Time , 2003, NAACL.

[17]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[18]  STUART M. SHIEBER RESTRICTING THE WEAK‐GENERATIVE CAPACITY OF SYNCHRONOUS TREE‐ADJOINING GRAMMARS , 1994, Comput. Intell..

[19]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[20]  Hideki Tanaka,et al.  Word Selection for EBMT based on Monolingual Similarity and Translation Confidence , 2003, ParallelTexts@NAACL-HLT.

[21]  Yuji Matsumoto,et al.  Sructural Matching of Parallel Texts , 1993, ACL.

[22]  Hideo Watanabe,et al.  A Similarity-Driven Transfer System , 1992, COLING.

[23]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[24]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[25]  Mosleh H. Al-Adhaileh Example-Based Machine Translation Based on the Synchronous SSTC Annotation Schema , 1999 .

[26]  Andy Way Machine translation using LFG-DOP , 2003 .

[27]  Chris Callison-Burch,et al.  A program for automatically selecting the best output from multiple machine translation engines , 2001, MTSUMMIT.

[28]  Khalil Sima'an,et al.  Data-Oriented Parsing , 2003 .

[29]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[30]  Mosleh Hmoud Al-Adhaileh,et al.  A Synchronization Structure of SSTC and Its Applications in Machine Translation , 2002, COLING 2002.

[31]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[32]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[33]  Ewan Klein,et al.  Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics , 2000, ACL 2000.