Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation

We present Jane 2, an open source toolkit supporting both the phrase-based and the hierarchical phrase-based paradigm for statistical machine translation. It is implemented in C++ and provides efficient decoding algorithms and data structures. This work focuses on the description of its phrase-based functionality. In addition to the standard pipeline, including phrase extraction and parameter optimization, Jane 2 contains several state-of-the-art extensions and tools. Forced alignment phrase training can considerably reduce rule table size while learning the translation scores in a more principled manner. Word class language models can be used to integrate longer context with a reduced vocabulary size. Rule table interpolation is applicable for different tasks, e.g. domain adaptation. The decoder distinguishes between lexical and coverage pruning and applies reordering constraints for efficiency.

[1]  Hermann Ney,et al.  Training Phrase Translation Models with Leaving-One-Out , 2010, ACL.

[2]  Hermann Ney,et al.  Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation , 2012, ACL.

[3]  Markus Freitag,et al.  Jane: User's Manual , 2013 .

[4]  Hermann Ney,et al.  Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models , 2009, EMNLP.

[5]  José B. Mariño,et al.  Ncode: an Open Source Bilingual N-gram SMT Toolkit , 2011, Prague Bull. Math. Linguistics.

[6]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[7]  Hermann Ney,et al.  On LM Heuristics for the Cube Growing Algorithm , 2009, EAMT.

[8]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[9]  Hermann Ney,et al.  Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models , 2010, WMT@ACL.

[10]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[11]  Daniel Jurafsky,et al.  Phrasal: A Statistical Machine Translation Toolkit for Exploring New Model Features , 2010, NAACL.

[12]  Chris Callison-Burch,et al.  Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation , 2009, ACL.

[13]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[14]  Taro Watanabe,et al.  Reordering Constraints for Phrase-Based Statistical Machine Translation , 2004, COLING.

[15]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[16]  Jingbo Zhu,et al.  NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation , 2012, ACL.

[17]  Hermann Ney,et al.  Analysing soft syntax features and heuristics for hierarchical phrase based machine translation. , 2008, IWSLT.

[18]  H. Ney,et al.  A Comparison of Various Types of Extended Lexicon Models for Statistical Machine Translation , 2010, AMTA.

[19]  Markus Freitag,et al.  Hierarchical Phrase-Based Translation with Jane 2 , 2012, Prague Bull. Math. Linguistics.

[20]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[21]  Noah A. Smith,et al.  Preference Grammars: Softening Syntactic Constraints to Improve Statistical Machine Translation , 2009, NAACL.

[22]  Hermann Ney,et al.  Improvements in dynamic programming beam search for phrase-based statistical machine translation. , 2008, IWSLT.

[23]  Hermann Ney,et al.  If i only had a parser: poor man's syntax for hierarchical machine translation , 2010, IWSLT.

[24]  Kevin Knight,et al.  Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[25]  Markus Freitag,et al.  A Guide to Jane, an Open Source Hierarchical Translation Toolkit , 2011, Prague Bull. Math. Linguistics.

[26]  Christopher D. Manning,et al.  Accurate Non-Hierarchical Phrase-Based Translation , 2010, NAACL.

[27]  Jan-Thorsten Peter,et al.  Soft string-to-dependency hierarchical machine translation , 2011, IWSLT.

[28]  Hermann Ney,et al.  Jane: an advanced freely available hierarchical machine translation toolkit , 2012, Machine Translation.

[29]  H. Ney,et al.  A Cocktail of Deep Syntactic Features for Hierarchical Machine Translation , 2010, AMTA.

[30]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[31]  Taro Watanabe,et al.  Online Large-Margin Training for Statistical Machine Translation , 2007, EMNLP.

[32]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[33]  Markus Freitag,et al.  Discriminative Reordering Extensions for Hierarchical Phrase-Based Machine Translation , 2012, EAMT.

[34]  Philip Resnik,et al.  Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.

[35]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[36]  Hermann Ney,et al.  Cardinality pruning and language model heuristics for hierarchical phrase-based translation , 2011, Machine Translation.

[37]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.