A Guide to Jane, an Open Source Hierarchical Translation Toolkit

A Guide to Jane, an Open Source Hierarchical Translation Toolkit Jane is RWTH's hierarchical phrase-based translation toolkit. It includes tools for phrase extraction, translation and scaling factor optimization, with efficient and documented programs of which large parts can be parallelized. The decoder features syntactic enhancements, reorderings, triplet models, discriminative word lexica, and support for a variety of language model formats. In this article, we will review the main features of Jane and explain the overall architecture. We will also indicate where and how new models can be included.

[1]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[2]  Hermann Ney,et al.  Analysing soft syntax features and heuristics for hierarchical phrase based machine translation. , 2008, IWSLT.

[3]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[4]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[5]  Jean-Cédric Chappelier,et al.  A Generalized CYK Algorithm for Parsing Stochastic CFG , 1998, TAPD.

[6]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[7]  M. Rey,et al.  11 , 001 New Features for Statistical Machine Translation , 2009 .

[8]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[9]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[10]  Daniel Zeman Using TectoMT as a Preprocessing Tool for Phrase-Based Statistical Machine Translation , 2010, TSD.

[11]  Pushpak Bhattacharyya,et al.  Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation , 2008, IJCNLP.

[12]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[13]  William H. Press,et al.  Numerical recipes in C , 2002 .

[14]  Miles Osborne,et al.  Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap , 2007, EMNLP.

[15]  Lane Schwartz,et al.  Reproducible Results in Parsing-Based Machine Translation: The JHU Shared Task Submission , 2010, WMT@ACL.

[16]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[17]  Hermann Ney,et al.  On LM Heuristics for the Cube Growing Algorithm , 2009, EAMT.

[18]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[19]  Markus Freitag,et al.  The RWTH Aachen Machine Translation System for WMT 2012 , 2013, WMT@ACL.

[20]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[21]  Noah A. Smith,et al.  Preference Grammars: Softening Syntactic Constraints to Improve Statistical Machine Translation , 2009, NAACL.

[22]  Hermann Ney,et al.  Training Phrase Translation Models with Leaving-One-Out , 2010, ACL.

[23]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[24]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[25]  Chris Callison-Burch,et al.  Joshua 2.0: A Toolkit for Parsing-Based Machine Translation with Syntax, Semirings, Discriminative Training and Other Goodies , 2010, WMT@ACL.

[26]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[27]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[28]  Hermann Ney,et al.  The RWTH Aachen Machine Translation System for WMT 2010 , 2010, WMT@ACL.

[29]  Hermann Ney,et al.  Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models , 2009, EMNLP.

[30]  Hermann Ney,et al.  Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models , 2010, WMT@ACL.

[31]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.