Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models

We present Jane, RWTH's hierarchical phrase-based translation system, which has been open sourced for the scientific community. This system has been in development at RWTH for the last two years and has been successfully applied in different machine translation evaluations. It includes extensions to the hierarchical approach developed by RWTH as well as other research institutions. In this paper we give an overview of its main features. We also introduce a novel reordering model for the hierarchical phrase-based approach which further enhances translation performance, and analyze the effect some recent extended lexicon models have on the performance of the system.

[1]  Hermann Ney,et al.  Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models , 2009, EMNLP.

[2]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[5]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[6]  Orson Scott Card Speaker for the Dead , 1986 .

[7]  Chris Callison-Burch,et al.  Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation , 2009, ACL.

[8]  Miles Osborne,et al.  Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap , 2007, EMNLP.

[9]  Hermann Ney,et al.  On LM Heuristics for the Cube Growing Algorithm , 2009, EAMT.

[10]  Phil Blunsom,et al.  A Discriminative Latent Variable Model for Statistical Machine Translation , 2008, ACL.

[11]  Noah A. Smith,et al.  Preference Grammars: Softening Syntactic Constraints to Improve Statistical Machine Translation , 2009, NAACL.

[12]  Jean-Cédric Chappelier,et al.  A Generalized CYK Algorithm for Parsing Stochastic CFG , 1998, TAPD.

[13]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[14]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[15]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[16]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[17]  Hermann Ney,et al.  Analysing soft syntax features and heuristics for hierarchical phrase based machine translation. , 2008, IWSLT.

[18]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[19]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[20]  Hermann Ney,et al.  Triplet Lexicon Models for Statistical Machine Translation , 2008, EMNLP.

[21]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[22]  Bowen Zhou,et al.  Enriching SCFG rules directly from efficient bilingual chart parsing , 2009, IWSLT.

[23]  Hermann Ney,et al.  Efficient Phrase-Table Representation for Machine Translation with Applications to Online MT and Speech Translation , 2007, NAACL.

[24]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[25]  Alexandra Birch,et al.  A Quantitative Analysis of Reordering Phenomena , 2009, WMT@EACL.

[26]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[27]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[28]  Hermann Ney,et al.  Comparison of Extended Lexicon Models in Search and Rescoring for SMT , 2009, HLT-NAACL.

[29]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[30]  William H. Press,et al.  Numerical recipes in C , 2002 .