Decoding in Joshua: Open Source, Parsing-Based Machine Translation

Decoding in Joshua: Open Source, Parsing-Based Machine Translation We describe a scalable decoder for parsing-based machine translation. The decoder is written in Java and implements all the essential algorithms described in (Chiang, 2007) and (Li and Khudanpur, 2008b): chart-parsing, n-gram language model integration, beam- and cube-pruning, and k-best extraction. Additionally, parallel and distributed computing techniques are exploited to make it scalable. We demonstrate experimentally that our decoder is more than 30 times faster than a baseline decoder written in Python.

[1]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[2]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[3]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[4]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[5]  Miles Osborne,et al.  Randomised Language Modelling for Statistical Machine Translation , 2007, ACL.

[6]  Adam Lopez,et al.  Hierarchical Phrase-Based Translation with Suffix Arrays , 2007, EMNLP.

[7]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[8]  Sanjeev Khudanpur,et al.  A Scalable Decoder for Parsing-Based Machine Translation with Equivalent Language Model State Maintenance , 2008, SSST@ACL.

[9]  Haizhou Li,et al.  I2r multi-pass machine translation system for IWSLT 2008 , 2008, IWSLT.

[10]  S. Khudanpur,et al.  Large-scale Discriminative n-gram Language Models for Statistical Machine Translation , 2008, AMTA.

[11]  Omar Zaidan,et al.  Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems , 2009, Prague Bull. Math. Linguistics.

[12]  Barry Haddow,et al.  Improved Minimum Error Rate Training in Moses , 2009, Prague Bull. Math. Linguistics.

[13]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[14]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[15]  Liang Huang,et al.  Statistical Syntax-Directed Translation with Extended Domain of Locality , 2006, AMTA.

[16]  David Chiang,et al.  An Introduction to Synchronous Grammars , 2006 .

[17]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[18]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[19]  Stephan Vogel,et al.  An Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT , 2007, NAACL.

[20]  Philip Resnik,et al.  Soft Syntactic Constraints for Hierarchical Phrased-Based Translation , 2008, ACL.

[21]  Chris Callison-Burch,et al.  Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases , 2005, ACL.

[22]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[23]  Jason Eisner,et al.  Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[24]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[25]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[26]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.