Reordering rules for English-Hindi SMT

Reordering is pre-processing stage for Statistical Machine Translation (SMT) system where the words of the source sentence are reordered as per the syntax of the target language. We are proposing a rich set of rules for better reordering. The idea is to facilitate the training process by better alignments and parallel phrase extraction for a phrase based SMT system. Reordering also helps the decoding process and hence improving the machine translation quality. We have observed significant improvements in the translation quality by using our approach over the baseline SMT. We have used BLEU, NIST, multi-reference word error rate, multi-reference position independent error rate for judging the improvements. We have exploited open source SMT toolkit MOSES to develop the system.

[1]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[2]  José A. R. Fonollosa,et al.  Syntax-based reordering for statistical machine translation , 2011, Comput. Speech Lang..

[3]  Pushpak Bhattacharyya,et al.  Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation , 2008, IJCNLP.

[4]  Hermann Ney,et al.  Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information , 2004, CL.

[5]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[6]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[7]  Alexandra Birch,et al.  Metrics for MT evaluation: evaluating reordering , 2010, Machine Translation.

[8]  Hermann Ney,et al.  Statistical Machine Translation with a Small Amount of Bilingual Training Data , 2006 .

[9]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[10]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[11]  Sergei Nirenburg,et al.  A Statistical Approach to Machine Translation , 2003 .

[12]  P. V. S. Avinesh,et al.  Does Syntactic Knowledge help English-Hindi SMT? , 2014, ArXiv.

[13]  Hermann Ney,et al.  An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[14]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[15]  Rohit Gupta,et al.  Learning Improved Reordering Models for Urdu, Farsi and Italian using SMT , 2012, SMT@COLING.

[16]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[19]  M. Ferguson,et al.  Automatic Evaluation , 2009 .

[20]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[21]  Hermann Ney,et al.  Accelerated DP based search for statistical translation , 1997, EUROSPEECH.

[22]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[23]  Alon Lavie,et al.  The Meteor metric for automatic evaluation of machine translation , 2009, Machine Translation.

[24]  Chao Wang,et al.  Chinese Syntactic Reordering for Statistical Machine Translation , 2007, EMNLP.

[25]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[26]  Irina Galinskaya,et al.  Building a reordering system using tree-to-string hierarchical model , 2012, SMT@COLING.

[27]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.