Forest to String Based Statistical Machine Translation with Hybrid Word Alignments

Forest to String Based Statistical Machine Translation (FSBSMT) is a forest-based tree sequence to string translation model for syntax based statistical machine translation. The model automatically learns tree sequence to string translation rules from a given word alignment estimated on a source-side-parsed bilingual parallel corpus. This paper presents a hybrid method which combines different word alignment methods and integrates them into an FSBSMT system. The hybrid word alignment provides the most informative alignment links to the FSBSMT system. We show that hybrid word alignment integrated into various experimental settings of FSBSMT provides considerable improvement over state-of-the-art Hierarchical Phrase based SMT (HPBSMT). The research also demonstrates that additional integration of Named Entities (NEs), their translations and Example Based Machine Translation (EBMT) phrases (all extracted from the bilingual parallel training data) into the system brings about further considerable performance improvements over the hybrid FSBSMT system. We apply our hybrid model to a distant language pair, English–Bengali. The proposed system achieves 78.5% relative (9.84 BLEU points absolute) improvement over baseline HPBSMT.

[1]  Haitao Mi,et al.  Forest-based Translation Rule Extraction , 2008, EMNLP.

[2]  Hermann Ney,et al.  AER: do we need to “improve” our alignments? , 2006, IWSLT.

[3]  Kevin Duh,et al.  Head Finalization: A Simple Reordering Rule for SOV Languages , 2010, WMT@ACL.

[4]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[5]  Stephan Vogel,et al.  The Syntax Augmented MT (SAMT) System at the Shared Task for the 2007 ACL Workshop on Statistical Machine Translation , 2007, WMT@ACL.

[6]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[7]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[8]  Yifan He,et al.  Combining Multiple Alignments to Improve Machine Translation , 2012, COLING.

[9]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[10]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[11]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[12]  Andy Way,et al.  Handling Named Entities and Compound Verbs in Phrase-Based Statistical Machine Translation , 2010, MWE@COLING.

[13]  Daniel Marcu,et al.  SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[14]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[15]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[16]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[17]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[18]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[19]  Kevin Knight,et al.  Training Tree Transducers , 2004, NAACL.

[20]  Sivaji Bandyopadhyay,et al.  A Hybrid Word Alignment Model for Phrase-Based Statistical Machine Translation , 2013, HyTra@ACL.

[21]  Haizhou Li,et al.  Forest-based Tree Sequence to String Translation Model , 2009, ACL.

[22]  Yuan Ding,et al.  Machine Translation Using Probabilistic Synchronous Dependency Insertion Grammars , 2005, ACL.

[23]  John DeNero,et al.  Model-Based Aligner Combination Using Dual Decomposition , 2011, ACL.

[24]  Sivaji Bandyopadhyay,et al.  Named Entity Recognition using Support Vector Machine: A Language Independent Approach , 2010 .

[25]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[26]  Liang Huang,et al.  A Syntax-Directed Translator with Extended Domain of Locality , 2006 .

[27]  Santanu Pal,et al.  Manawi: Using Multi-Word Expressions and Named Entities to Improve Machine Translation , 2014, WMT@ACL.

[28]  Sivaji Bandyopadhyay,et al.  Word Alignment-Based Reordering of Source Chunks in PB-SMT , 2014, LREC.

[29]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[30]  Qun Liu,et al.  Forest-Based Translation , 2008, ACL.

[31]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[32]  Yang Liu,et al.  Extracting Hierarchical Rules from a Weighted Alignment Matrix , 2011, IJCNLP.

[33]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[34]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[35]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[36]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[37]  Yang Liu,et al.  Weighted Alignment Matrices for Statistical Machine Translation , 2009, EMNLP.

[38]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[39]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[40]  Graham Neubig,et al.  Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers , 2013, ACL.

[41]  Marcin Junczys-Dowmunt,et al.  SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation , 2011, SIIS.

[42]  H. Altay Güvenir,et al.  Learning Translation Templates from Bilingual Translation Examples , 2004, Applied Intelligence.

[43]  Christof Monz,et al.  NeurAlign: Combining Word Alignments Using Neural Networks , 2005, HLT/EMNLP.

[44]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[45]  Liang Huang,et al.  Statistical Syntax-Directed Translation with Extended Domain of Locality , 2006, AMTA.

[46]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[47]  Jun'ichi Tsujii,et al.  Effective Use of Function Words for Rule Generalization in Forest-Based Translation , 2011, ACL.