A tree-to-tree model for statistical machine translation

In this thesis, we take a statistical tree-to-tree approach to solving the problem of machine translation (MT). In a statistical tree-to-tree approach, first the source-language input is parsed into a syntactic tree structure; then the source-language tree is mapped to a target-language tree. This kind of approach has several advantages. For one, parsing the input generates valuable information about its meaning. In addition, the mapping from a source-language tree to a target-language tree offers a mechanism for preserving the meaning of the input. Finally, producing a target-language tree helps to ensure the grammaticality of the output. A main focus of this thesis is to develop a statistical tree-to-tree mapping algorithm. Our solution involves a novel representation called an aligned extended projection, or AEP. The AEP, inspired by ideas in linguistic theory related to tree-adjoining grammars, is a parse-tree like structure that models clause-level phenomena such as verbal argument structure and lexical word-order. The AEP also contains alignment information that links the source-language input to the target-language output. Instead of learning a mapping from a source-language tree to a target-language tree, the AEP-based approach learns a mapping from a source-language tree to a target-language AEP. The AEP is a complex structure, and learning a mapping from parse trees to AEPs presents a challenging machine learning problem. In this thesis, we use a linear structured prediction model to solve this learning problem. A human evaluation of the AEP-based translation approach in a German-to-English task shows significant improvements in the grammaticality of translations. This thesis also presents a statistical parser for Spanish that could be used as part of a Spanish/English translation system. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  B. Navarro,et al.  Syntactic , semantic and pragmatic annotation in Cast 3 LB , 2003 .

[2]  Daniel Marcu,et al.  Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy , 2007, EMNLP.

[3]  Liliane Haegeman,et al.  English Grammar: A Generative Perspective , 1998 .

[4]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[5]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[6]  Karen T. Zagona,et al.  The Syntax of Spanish , 2001 .

[7]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[8]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[9]  Michael Collins,et al.  A Discriminative Model for Tree-to-Tree Translation , 2006, EMNLP.

[10]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[11]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[12]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[13]  Eitan M. Gurari,et al.  Introduction to the theory of computation , 1989 .

[14]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[15]  Yuan Ding,et al.  Machine Translation Using Probabilistic Synchronous Dependency Insertion Grammars , 2005, ACL.

[16]  Amit Dubey,et al.  What to Do When Lexicalization Fails: Parsing German with Suffix Analysis and Smoothing , 2005, ACL.

[17]  Liang Huang,et al.  Statistical Syntax-Directed Translation with Extended Domain of Locality , 2006, AMTA.

[18]  Kenji Yamada,et al.  Syntax-based language models for statistical machine translation , 2003, ACL 2003.

[19]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[20]  Hiyan Alshawi,et al.  Head Automata and Bilingual Tiling: Translation with Minimal Representations , 1996, ACL.

[21]  A. Brief A Smorgasbord it is. , 1983 .

[22]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[23]  Jason Eisner,et al.  Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[24]  Dekang Lin A Path-based Transfer Model for Machine Translation , 2004, COLING.

[25]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[26]  Alexander M. Rush,et al.  Induction of Probabilistic Synchronous Tree-Insertion Grammars for Machine Translation , 2006 .

[27]  Chris Quirk,et al.  Using Dependency Order Templates to Improve Generality in Translation , 2007, WMT@ACL.

[28]  Michael Collins,et al.  Morphology and Reranking for the Statistical Parsing of Spanish , 2005, HLT.

[29]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[30]  I. Dan Melamed,et al.  Statistical Machine Translation by Parsing , 2004, ACL.

[31]  Yehoshua Bar-Hillel,et al.  The present state of research on mechanical translation , 1951, EARLYMT.

[32]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[33]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[34]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[35]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[36]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[37]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[38]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[39]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[40]  Chris Quirk,et al.  The impact of parse quality on syntactically-informed statistical machine translation , 2006, EMNLP.

[41]  Stuart M. Shieber,et al.  Simpler TAG semantics through synchronization , 2006 .

[42]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[43]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[44]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[45]  Robert Frank,et al.  Phrase Structure Composition and Syntactic Dependencies , 2002, Computational Linguistics.

[46]  Stuart M. Shieber,et al.  Extraction Phenomena in Synchronous TAG Syntax and Semantics , 2007, SSST@HLT-NAACL.

[47]  Heidi Fox,et al.  Phrasal Cohesion and Statistical Machine Translation , 2002, EMNLP.

[48]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[49]  Stefan Riezler,et al.  Grammatical Machine Translation , 2006, NAACL.

[50]  Stuart M. Shieber,et al.  Synchronous Tree-Adjoining Grammars , 1990, COLING.

[51]  Kevin Knight,et al.  Training Tree Transducers , 2004, NAACL.

[52]  Hermann Ney,et al.  Morpho-syntactic analysis for reordering in statistical machine translation , 2001, MTSUMMIT.

[53]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[54]  Wolfgang Wahlster,et al.  Verbmobil: Translation of Face-To-Face Dialogs , 1993, MTSUMMIT.

[55]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[56]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[57]  Ben Taskar,et al.  Exponentiated Gradient Algorithms for Large-margin Structured Classification , 2004, NIPS.

[58]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[59]  Stuart M. Shieber,et al.  Probabilistic Synchronous Tree-Adjoining Grammars for Machine Translation: The Argument from Bilingual Dictionaries , 2007, SSST@HLT-NAACL.

[60]  PietraVincent J. Della,et al.  The mathematics of statistical machine translation , 1993 .

[61]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[62]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[63]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[64]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[65]  Srinivas Bangalore,et al.  Learning Dependency Translation Models as Collections of Finite-State Head Transducers , 2000, Computational Linguistics.

[66]  Alexander M. Fraser,et al.  A Smorgasbord of Features for Statistical Machine Translation , 2004, NAACL.

[67]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[68]  Erhard W. Hinrichs,et al.  Is it Really that Difficult to Parse German? , 2006, EMNLP.

[69]  Michael Sipser,et al.  Introduction to the Theory of Computation , 1996, SIGA.

[70]  Richard C. Waters,et al.  Tree Insertion Grammar: A Cubic-Time, Parsable Formalism that Lexicalizes Context-Free Grammar without Changing the Trees Produced , 1995, CL.

[71]  Jon Oberlander,et al.  IN PROCEEDINGS OF EACL-2006 , 2006 .

[72]  Daniel Marcu,et al.  SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[73]  Daniel Gildea,et al.  Loosely Tree-Based Alignment for Machine Translation , 2003, ACL.

[74]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[75]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[76]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[77]  Hermann Ney,et al.  Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information , 2004, CL.

[78]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[79]  Daniel Gildea,et al.  Synchronous Binarization for Machine Translation , 2006, NAACL.

[80]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[81]  Bonnie J. Dorr,et al.  Machine Translation: A View from the Lexicon , 1994, CL.

[82]  Hermann Ney,et al.  A Comparison of Alignment Models for Statistical Machine Translation , 2000, COLING.

[83]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[84]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.