论文信息 - A Scalable Decoder for Parsing-Based Machine Translation with Equivalent Language Model State Maintenance

A Scalable Decoder for Parsing-Based Machine Translation with Equivalent Language Model State Maintenance

We describe a scalable decoder for parsing-based machine translation. The decoder is written in JAVA and implements all the essential algorithms described in Chiang (2007): chart-parsing, m-gram language model integration, beam- and cube-pruning, and unique k-best extraction. Additionally, parallel and distributed computing techniques are exploited to make it scalable. We also propose an algorithm to maintain equivalent language model states that exploits the back-off property of m-gram language models: instead of maintaining a separate state for each distinguished sequence of "state" words, we merge multiple states that can be made equivalent for language model probability calculations due to back-off. We demonstrate experimentally that our decoder is more than 30 times faster than a baseline decoder written in PYTHON. We propose to release our decoder as an open-source toolkit.

Sanjeev Khudanpur | Zhifei Li | S. Khudanpur | Zhifei Li

[1] Thorsten Brants,et al. Large Language Models in Machine Translation , 2007, EMNLP.

[2] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[3] Yang Liu,et al. Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[4] David Chiang,et al. Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[5] Ying Zhang,et al. Distributed Language Modeling for N-best List Re-ranking , 2006, EMNLP.

[6] Chris Quirk,et al. Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[7] Jason Eisner,et al. Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[8] Stephan Vogel,et al. An Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT , 2007, NAACL.

[9] David Chiang,et al. Hierarchical Phrase-Based Translation , 2007, CL.

[10] Adam Lopez,et al. Hierarchical Phrase-Based Translation with Suffix Arrays , 2007, EMNLP.

[11] David Chiang,et al. Better k-best Parsing , 2005, IWPT.

[12] Hermann Ney,et al. Improved Statistical Alignment Models , 2000, ACL.

[13] Liang Huang,et al. Statistical Syntax-Directed Translation with Extended Domain of Locality , 2006, AMTA.

[14] David Chiang,et al. An Introduction to Synchronous Grammars , 2006 .

[15] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[16] Daniel Marcu,et al. Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[17] Ahmad Emami,et al. Large-Scale Distributed Language Modeling , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[18] Stuart M. Shieber,et al. Principles and Implementation of Deductive Parsing , 1994, J. Log. Program..

[19] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.