Phrase Dependency Machine Translation with Quasi-Synchronous Tree-to-Tree Features

Recent research has shown clear improvement in translation quality by exploiting linguistic syntax for either the source or target language. However, when using syntax for both languages (“tree-to-tree” translation), there is evidence that syntactic divergence can hamper the extraction of useful rules (Ding and Palmer 2005). Smith and Eisner (2006) introduced quasi-synchronous grammar, a formalism that treats non-isomorphic structure softly using features rather than hard constraints. Although a natural fit for translation modeling, its flexibility has proved challenging for building real-world systems. In this article, we present a tree-to-tree machine translation system inspired by quasi-synchronous grammar. The core of our approach is a new model that combines phrases and dependency syntax, integrating the advantages of phrase-based and syntax-based translation. We report statistically significant improvements over a phrase-based baseline on five of seven test sets across four language pairs. We also present encouraging preliminary results on the use of unsupervised dependency parsing for syntax-based machine translation.

[1]  Liang Huang,et al.  Statistical Syntax-Directed Translation with Extended Domain of Locality , 2006, AMTA.

[2]  Yansong Feng,et al.  Title Generation with Quasi-Synchronous Grammar , 2010, EMNLP.

[3]  John DeNero,et al.  Inducing Sentence Structure from Parallel Corpora for Reordering , 2011, EMNLP.

[4]  Noah A. Smith,et al.  Structured Ramp Loss Minimization for Machine Translation , 2012, HLT-NAACL.

[5]  Shankar Kumar,et al.  Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2008, EMNLP.

[6]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[7]  Christopher D. Manning,et al.  Quadratic-Time Dependency Parsing for Machine Translation , 2009, ACL.

[8]  Wanxiang Che,et al.  Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars , 2012, ACL.

[9]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[10]  Anders Søgaard,et al.  Empirical Lower Bounds on Aligment Error Rates in Syntax-Based Machine Translation , 2009, SSST@HLT-NAACL.

[11]  Daniel Gildea,et al.  Loosely Tree-Based Alignment for Machine Translation , 2003, ACL.

[12]  Xuanjing Huang,et al.  Phrase Dependency Parsing for Opinion Mining , 2009, EMNLP.

[13]  Slav Petrov,et al.  Coarse-to-Fine Natural Language Processing , 2011, Theory and Applications of Natural Language Processing.

[14]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[15]  Noah A. Smith,et al.  Compiling Comp Ling: Weighted Dynamic Programming and the Dyna Language , 2005, HLT.

[16]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[17]  Xavier Carreras,et al.  Non-Projective Parsing for Statistical Machine Translation , 2009, EMNLP.

[18]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[19]  Philipp Koehn,et al.  Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation , 2011, EMNLP.

[20]  Jonas Kuhn,et al.  Making Ellipses Explicit in Dependency Conversion for a German Treebank , 2012, LREC.

[21]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[22]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[23]  Heidi Fox,et al.  Phrasal Cohesion and Statistical Machine Translation , 2002, EMNLP.

[24]  Yang Liu,et al.  Dependency Forest for Statistical Machine Translation , 2010, COLING.

[25]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[26]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[27]  Hermann Ney,et al.  Generation of Word Graphs in Statistical Machine Translation , 2002, EMNLP.

[28]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[29]  David A. Smith,et al.  Quasi-Synchronous Grammars: Alignment by Soft Projection of Syntactic Dependencies , 2006, WMT@HLT-NAACL.

[30]  Marc Dymetman,et al.  Translating with Non-contiguous Phrases , 2005, HLT.

[31]  Noah A. Smith,et al.  Rich Source-Side Context for Statistical Machine Translation , 2008, WMT@ACL.

[32]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[33]  Qun Liu,et al.  Forest-Based Translation , 2008, ACL.

[34]  Jason Eisner,et al.  Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[35]  Feifei Zhai,et al.  Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax , 2011, EMNLP.

[36]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[37]  Michael Collins,et al.  A Discriminative Model for Tree-to-Tree Translation , 2006, EMNLP.

[38]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[39]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[40]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[41]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[42]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[43]  Stephan Vogel,et al.  A Word-Class Approach to Labeling PSCFG Rules for Machine Translation , 2011, ACL.

[44]  Keith B. Hall,et al.  Training dependency parsers by jointly optimizing multiple objectives , 2011, EMNLP.

[45]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[46]  Noah A. Smith,et al.  Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance , 2011, EMNLP.

[47]  François Yvon,et al.  Gappy Translation Units under Left-to-Right SMT Decoding , 2009, EAMT.

[48]  Sanjeev Khudanpur,et al.  A Scalable Decoder for Parsing-Based Machine Translation with Equivalent Language Model State Maintenance , 2008, SSST@ACL.

[49]  Alfred V. Aho,et al.  Syntax Directed Translations and the Pushdown Assembler , 1969, J. Comput. Syst. Sci..

[50]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[51]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[52]  Mirella Lapata,et al.  Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming , 2011, EMNLP.

[53]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[54]  Yang Liu,et al.  Dependency-Based Bracketing Transduction Grammar for Statistical Machine Translation , 2010, COLING.

[55]  Roger Levy,et al.  Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.

[56]  Stephen J. Wright,et al.  Object-oriented software for quadratic programming , 2003, TOMS.

[57]  Alon Lavie,et al.  Automatic Category Label Coarsening for Syntax-Based Machine Translation , 2011, SSST@ACL.

[58]  Noah A. Smith,et al.  Concavity and Initialization for Unsupervised Dependency Parsing , 2012, NAACL.

[59]  W. Bruce Croft,et al.  A quasi-synchronous dependence model for information retrieval , 2011, CIKM '11.

[60]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[61]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[62]  Christopher D. Manning,et al.  Accurate Non-Hierarchical Phrase-Based Translation , 2010, NAACL.

[63]  Noah A. Smith,et al.  Cube Summing, Approximate Inference with Non-Local Features, and Dynamic Programming without Semirings , 2009, EACL.

[64]  David A. Smith,et al.  Parser Adaptation and Projection with Quasi-Synchronous Grammar Features , 2009, EMNLP.

[65]  Ben Taskar,et al.  Sidestepping Intractable Inference with Structured Ensemble Cascades , 2010, NIPS.

[66]  Franz Josef Och,et al.  A Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT , 2008, COLING.

[67]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[68]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[69]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[70]  Bernd Bohnet,et al.  Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.

[71]  Stefan Ortmanns,et al.  High quality word graphs using forward-backward pruning , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[72]  Jonas Kuhn Experiments in parallel-text based grammar induction , 2004, ACL.

[73]  Kentaro Torisawa,et al.  EXPLOITING SUBTREES IN AUTO‐PARSED DATA TO IMPROVE DEPENDENCY PARSING , 2012, Comput. Intell..

[74]  I. Dan Melamed,et al.  Multitext Grammars and Synchronous Parsers , 2003, NAACL.

[75]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[76]  Noah A. Smith,et al.  Feature-Rich Translation by Quasi-Synchronous Lattice Parsing , 2009, EMNLP.

[77]  Yang Liu,et al.  Forest-to-String Statistical Translation Rules , 2007, ACL.

[78]  Noah A. Smith,et al.  Quasi-Synchronous Phrase Dependency Grammars for Machine Translation , 2011, EMNLP.

[79]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[80]  Gregory Shakhnarovich,et al.  Discriminative Re-ranking of Diverse Segmentations , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Joakim Nivre,et al.  Dependency Parsing , 2009, Lang. Linguistics Compass.

[82]  Eric P. Xing,et al.  Turbo Parsers: Dependency Parsing by Approximate Variational Inference , 2010, EMNLP.

[83]  Qun Liu,et al.  A novel dependency-to-string model for statistical machine translation , 2011, EMNLP.

[84]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[85]  Phil Blunsom,et al.  Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing , 2010, EMNLP.

[86]  Philip Resnik,et al.  Exploiting syntactic relationships in a phrase-based decoder: an exploration , 2010, Machine Translation.

[87]  Haitao Mi,et al.  Forest-based Translation Rule Extraction , 2008, EMNLP.

[88]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[89]  David Chiang,et al.  Learning to Translate with Source and Target Syntax , 2010, ACL.

[90]  Mark-Jan Nederhof,et al.  Squibs and Discussions: Weighted Deductive Parsing and Knuth’s Algorithm , 2003, CL.

[91]  Stefan Riezler,et al.  Grammatical Machine Translation , 2006, NAACL.

[92]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[93]  Alexandra Birch,et al.  A Quantitative Analysis of Reordering Phenomena , 2009, WMT@EACL.

[94]  Mark Steedman,et al.  Two Decades of Unsupervised POS Induction: How Far Have We Come? , 2010, EMNLP.

[95]  Noah A. Smith,et al.  Parsing with Soft and Hard Constraints on Dependency Length , 2005 .

[96]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[97]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[98]  I. Dan Melamed,et al.  Empirical Lower Bounds on the Complexity of Translational Equivalence , 2006, ACL.

[99]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[100]  Stuart M. Shieber,et al.  Synchronous Tree-Adjoining Grammars , 1990, COLING.

[101]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[102]  Qun Liu,et al.  A Dependency Treelet String Correspondence Model for Statistical Machine Translation , 2007, WMT@ACL.

[103]  Yang Liu,et al.  Improving Tree-to-Tree Translation with Packed Forests , 2009, ACL.

[104]  Christian Chiarcos,et al.  A New Hybrid Dependency Parser for German , 2009 .

[105]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[106]  Andreas Zollmann,et al.  Learning Multiple-Nonterminal Synchronous Grammars for Statistical Machine Translation , 2009 .

[107]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[108]  Noah A. Smith,et al.  Novel estimation methods for unsupervised discovery of latent structure in natural language text , 2007 .

[109]  Regina Barzilay,et al.  Unsupervised Multilingual Grammar Induction , 2009, ACL.

[110]  Bonnie J. Dorr,et al.  Machine Translation Divergences: A Formal Description and Proposed Solution , 1994, CL.

[111]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[112]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[113]  Valentin I. Spitkovsky,et al.  From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing , 2010, NAACL.

[114]  A. Lavie,et al.  Improving Syntax-Driven Translation Models by Re-structuring Divergent and Nonisomorphic Parse Tree Structures , 2008, AMTA.

[115]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[116]  Nasredine Semmar,et al.  Inducing Multilingual Text Analysis Tools Using Bidirectional Recurrent Neural Networks , 2016, COLING.

[117]  Nick Cercone,et al.  Computational Linguistics , 1986, Communications in Computer and Information Science.

[118]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[119]  Dekang Lin,et al.  A Path-based Transfer Model for Machine Translation , 2004, COLING.

[120]  Eric P. Xing,et al.  Concise Integer Linear Programming Formulations for Dependency Parsing , 2009, ACL.

[121]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[122]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[123]  Mark Johnson,et al.  Transforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold , 2007, ACL.

[124]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[125]  Kevin Gimpel,et al.  Discriminative Feature-Rich Modeling for Syntax-Based Machine Translation , 2012 .

[126]  Valentin I. Spitkovsky,et al.  Unsupervised Dependency Parsing without Gold Part-of-Speech Tags , 2011, EMNLP.

[127]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[128]  Slav Petrov,et al.  Training a Parser for Machine Translation Reordering , 2011, EMNLP.

[129]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .

[130]  Christopher D. Manning,et al.  Parsing Three German Treebanks: Lexicalized and Unlexicalized Baselines , 2008 .

[131]  Yuan Ding,et al.  Machine Translation Using Probabilistic Synchronous Dependency Insertion Grammars , 2005, ACL.

[132]  Christopher D. Manning,et al.  Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[133]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[134]  Ben Taskar,et al.  Dependency Grammar Induction via Bitext Projection Constraints , 2009, ACL/IJCNLP.