Exploring System Combination approaches for Indo-Aryan MT Systems

Statistical Machine Translation (SMT) systems are heavily dependent on the quality of parallel corpora used to train translation models. Translation quality between certain Indian languages is often poor due to the lack of training data of good quality. We used triangulation as a technique to improve the quality of translations in cases where the direct translation model did not perform satisfactorily. Triangulation uses a third language as a pivot between the source and target languages to achieve an improved and more efficient translation model in most cases. We also combined multi-pivot models using linear mixture and obtained significant improvement in BLEU scores compared to the direct source-target models.

[1]  Loïc Barrault,et al.  Using Hypothesis Selection Based Features for Confusion Network MT System Combination , 2014, HyTra@EACL.

[2]  Hitoshi Isahara,et al.  A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation , 2007, NAACL.

[3]  Narayan Choudhary,et al.  Creating Multilingual Parallel Corpora in Indian Languages , 2011, LTC.

[4]  Shashi Pal Singh,et al.  Assessing the Quality of MT Systems for Hindi to English Translation , 2014, ArXiv.

[5]  Ankush Gupta,et al.  METEOR-Hindi : Automatic MT Evaluation Metric for Hindi as a Target Language , 2010 .

[6]  Richard M. Schwartz,et al.  Improved Word-Level System Combination for Machine Translation , 2007, ACL.

[7]  Hwee Tou Ng,et al.  Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages , 2012, J. Artif. Intell. Res..

[8]  Hermann Ney,et al.  The RWTH System Combination System for WMT 2010 , 2010, WMT@ACL.

[9]  Anoop Sarkar,et al.  Ensemble Triangulation for Statistical Machine Translation , 2013, IJCNLP.

[10]  Loïc Barrault,et al.  Many , 2020, Definitions.

[11]  Hwee Tou Ng,et al.  Improved Statistical Machine Translation for Resource-Poor Languages Using Related Resource-Rich Languages , 2009, EMNLP.

[12]  Omar Zaidan,et al.  Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems , 2009, Prague Bull. Math. Linguistics.

[13]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[15]  Sanjeev Khudanpur,et al.  Machine Translation System Combination using ITG-based Alignments , 2008, ACL.

[16]  Mirella Lapata,et al.  Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora , 2007, ACL.