Syntax- and semantic-based reordering in hierarchical phrase-based statistical machine translation

A syntax-based reordering model (RM) for SMT system is proposed.Our RM predicts the orientation between syntactic dependants of the source sentence.We enrich the proposed RM with semantic features, so it can perform semantic generalization.Our RM outperforms the baseline and two competing RMs in terms of BLEU and TER. We present a syntax-based reordering model (RM) for hierarchical phrase-based statistical machine translation (HPB-SMT) enriched with semantic features. Our model brings a number of novel contributions: (i) while the previous dependency-based RM is limited to the reordering of head and dependant constituent pairs, we also model the reordering of pairs of dependants; (ii) Our model is enriched with semantic features (Wordnet synsets) in order to allow the reordering model to generalize to pairs not seen in training but with equivalent meaning. (iii) We evaluate our model on two language directions: English-to-Farsi and English-to-Turkish. These language pairs are particularly challenging due to the free word order, rich morphology and lack of resources of the target languages.We evaluate our RM both intrinsically (accuracy of the RM classifier) and extrinsically (MT). Our best configuration outperforms the baseline classifier by 529% on pairs of dependants and by 1230% on head and dependant pairs while the improvement on MT ranges between 1.6% and 5.5% relative in terms of BLEU depending on language pair and domain. We also analyze the value of the feature weights to obtain further insights on the impact of the reordering-related features in the HPB-SMT model. We observe that the features of our RM are assigned significant weights and that our features are complementary to the reordering feature included by default in the HPB-SMT model.

[1]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[2]  Alon Lavie,et al.  Improving Syntax-Augmented Machine Translation by Coarsening the Label Set , 2013, NAACL.

[3]  Dmitriy Genzel,et al.  Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation , 2010, COLING.

[4]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[5]  Yang Gao Discriminative Reordering Model for Machine Translation , 2010 .

[6]  Kevin Duh,et al.  Extracting Pre-ordering Rules from Predicate-Argument Structures , 2011, IJCNLP.

[7]  Arianna Bisazza,et al.  Dynamically Shaping the Reordering Search Space of Phrase-Based Statistical Machine Translation , 2013, Transactions of the Association for Computational Linguistics.

[8]  Stefan Riezler,et al.  On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[9]  Jason Eisner,et al.  Learning Linear Ordering Problems for Better Translation , 2009, EMNLP.

[10]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[11]  Alexandra Birch,et al.  Reordering metrics for statistical machine translation , 2011 .

[12]  Yuji Matsumoto,et al.  Phrase reordering for statistical machine translation based on predicate-argument structure , 2006, IWSLT.

[13]  Ding Liu,et al.  Semantic Role Features for Machine Translation , 2010, COLING.

[14]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[15]  Philip Resnik,et al.  A Unified Model for Soft Linguistic Reordering Constraints in Statistical Machine Translation , 2014, ACL.

[16]  Alexandra Birch,et al.  A Quantitative Analysis of Reordering Phenomena , 2009, WMT@EACL.

[17]  Chris Brew,et al.  Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 2012 .

[18]  Philipp Koehn,et al.  Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation , 2011, EMNLP.

[19]  Antonio Toral,et al.  Using Wordnet to Improve Reordering in Hierarchical Phrase-Based Statistical Machine Translation , 2016, GWC.

[20]  Hao Yu,et al.  Extending the Hierarchical Phrase Based Model with Maximum Entropy Based BTG , 2010, AMTA.

[21]  Andy Way,et al.  Dependency-based Reordering Model for Constituent Pairs in Hierarchical SMT , 2015, EAMT.

[22]  Kevin Duh,et al.  Hierarchical Phrase-based Machine Translation with Word-based Reordering Model , 2010, COLING.

[23]  Rabih Zbib,et al.  Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation , 2013, EMNLP.

[24]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[25]  Stephan Vogel,et al.  Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation , 2013, ACL.

[26]  Dan Klein,et al.  Optimization, Maxent Models, and Conditional Estimation without Magic , 2003, NAACL.

[27]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[28]  Philipp Koehn,et al.  A unified framework for phrase-based, hierarchical, and syntax-based statistical machine translation , 2009, IWSLT.

[29]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[30]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[31]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[32]  Philip Resnik,et al.  Modeling Syntactic and Semantic Structures in Hierarchical Phrase-based Translation , 2013, HLT-NAACL.

[33]  Scott M. Smith,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1989 .

[34]  Russell V. Lenth,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[35]  Hermann Ney,et al.  Discriminative Reordering Models for Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[36]  Christoph Tillmann,et al.  A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[37]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[38]  Satoshi Nakamura,et al.  Learning local word reorderings for hierarchical phrase-based statistical machine translation , 2016, Machine Translation.

[39]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[40]  Haizhou Li,et al.  Modeling the Translation of Predicate-Argument Structure for SMT , 2012, ACL.

[41]  Kemal Oflazer Statistical Machine Translation into a Morphologically Complex Language , 2008, CICLing.

[42]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[43]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[44]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[45]  Qun Liu,et al.  Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation , 2006, ACL.

[46]  Andy Way,et al.  Benchmarking SMT Performance for Farsi Using the TEP++ Corpus , 2015, EAMT.

[47]  Yu Zhou,et al.  Machine Translation by Modeling Predicate-Argument Structure Transformation , 2012, COLING.

[48]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[49]  Hermann Ney,et al.  A Phrase Orientation Model for Hierarchical Machine Translation , 2013, WMT@ACL.

[50]  Peng Xu,et al.  Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages , 2009, NAACL.

[51]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.