Syntax-based language models for statistical machine translation

The goal of machine translation is to develop algorithms that produce human-quality translations of natural language sentences. The evaluation of machine translation quality is split broadly into two aspects: adequacy and fluency. Adequacy measures how faithfully the meaning of the original sentence is preserved, whereas fluency measures whether this meaning is expressed in valid sentences in the target language. While both of these criteria are difficult to meet; fluency is a much more difficult goal. Generally, this likely has something to do with the asymmetrical nature of producing and understanding sentences; although humans are quite robust at inferring the meaning of text even in the presence of lots of noise and error, the rules that govern grammatical utterances are exacting, subtle; and elusive. To produce understandable text, we can rely on this robust processing hardware, but to produce grammatical text, we have to understand how it, works. This dissertation attempts to improve the fluency of machine translation output by explicitly incorporating models of the target language structure into machine translation systems. It is organized into three parts. First, we propose a framework for decoding that decouples the structures of the sentences of the source and target languages, and evaluate it with existing grammatical models as language models for machine translation. Next, we apply lessons from that task to the learning of grammars more suitable to the demands of the machine translation. We then incorporate these grammars, called Tree Substitution Grammars, into our decoding framework.

[1]  Jason Eisner,et al.  Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[2]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[3]  Phil Blunsom,et al.  Blocked Inference in Bayesian Tree Substitution Grammars , 2010, ACL.

[4]  Joshua Goodman Efficient Algorithms for Parsing the DOP Model , 1996, EMNLP.

[5]  Hao Yu,et al.  Learning Phrase Boundaries for Hierarchical Phrase-based Translation , 2010, COLING.

[6]  Daniel Marcu,et al.  Fast Decoding and Optimal Decoding for Machine Translation , 2001, ACL.

[7]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[8]  Matt Post,et al.  Bayesian Learning of a Tree Substitution Grammar , 2009, ACL.

[9]  Daniel Gildea,et al.  Worst-Case Synchronous Grammar Rules , 2007, HLT-NAACL.

[10]  Eugene Charniak,et al.  Immediate-Head Parsing for Language Models , 2001, ACL.

[11]  Chris Quirk,et al.  Discriminative, Syntactic Language Modeling through Latent SVMs , 2008 .

[12]  Dan Klein,et al.  Type-Based MCMC , 2010, HLT-NAACL.

[13]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[14]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[15]  Philip Resnik,et al.  Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.

[16]  Hermann Ney,et al.  An Efficient A* Search Algorithm for Statistical Machine Translation , 2001, DDMMT@ACL.

[17]  Alexander M. Fraser,et al.  A Smorgasbord of Features for Statistical Machine Translation , 2004, NAACL.

[18]  Daniel Gildea,et al.  Factorization of Synchronous Context-Free Grammars in Linear Time , 2007, SSST@HLT-NAACL.

[19]  Taro Watanabe,et al.  Online Large-Margin Training for Statistical Machine Translation , 2007, EMNLP.

[20]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[21]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[22]  Ralph Grishman,et al.  Chart-Based Transfer Rule Application in Machine Translation , 2000, COLING.

[23]  Joshua Goodman,et al.  Parsing Inside-Out , 1998, ArXiv.

[24]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[25]  Chris Dyer,et al.  A Gibbs Sampler for Phrasal Synchronous Grammar Induction , 2009, ACL.

[26]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[27]  Dekai Wu,et al.  A Polynomial-Time Algorithm for Statistical Machine Translation , 1996, ACL.

[28]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[29]  Daniel Gildea,et al.  Synchronous Binarization for Machine Translation , 2006, NAACL.

[30]  Markus Dreyer,et al.  Better Informed Training of Latent Syntactic Features , 2006, EMNLP.

[31]  Daniel Gildea Parsers as language models for statistical machine translation , 2008 .

[32]  Rens Bod Using an Annotated Corpus as a Stochastic Grammar , 1993, EACL.

[33]  Daniel Gildea,et al.  Efficient Multi-Pass Decoding for Synchronous Context Free Grammars , 2008, ACL.

[34]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[35]  Khalil Simaan,et al.  Computational Complexity of Probabilistic Disambiguation by means of Tree-Grammars , 1996, COLING.

[36]  John DeNero,et al.  Consensus Training for Consensus Decoding in Machine Translation , 2009, EMNLP.

[37]  Yaser Al-Onaizan,et al.  Translation with Finite-State Devices , 1998, AMTA.

[38]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[39]  Brian Roark,et al.  Discriminative Syntactic Language Modeling for Speech Recognition , 2005, ACL.

[40]  Brian Roark,et al.  Linear Complexity Context-Free Parsing Pipelines via Chart Constraints , 2009, NAACL.

[41]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[42]  Daniel Marcu,et al.  SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[43]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[44]  Owen Rambow The Simple Truth about Dependency and Phrase Structure Representations: An Opinion Piece , 2010, HLT-NAACL.

[45]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[46]  Giorgio Satta,et al.  Some Computational Complexity Results for Synchronous Context-Free Grammars , 2005, HLT/EMNLP.

[47]  Stanley F. Chen,et al.  A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[48]  Frederick Jelinek The Dawn of Statistical ASR and MT , 2009, Computational Linguistics.

[49]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[50]  Alexander H. Waibel,et al.  Decoding Algorithm in Statistical Machine Translation , 1997, ACL.

[51]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[52]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[53]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[54]  Brian Roark,et al.  Classifying Chart Cells for Quadratic Complexity Context-Free Inference , 2008, COLING.

[55]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[56]  Dekai Wu,et al.  Toward machine translation with statistics and syntax and semantics , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[57]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[58]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[59]  Dan Klein,et al.  Simple, Accurate Parsing with an All-Fragments Grammar , 2010, ACL.

[60]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[61]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[62]  Francisco Casacuberta,et al.  Submission to ICGI-2000 Computational complexity of problems on probabilistic grammars and transducers , 2007 .

[63]  Phil Blunsom,et al.  Inducing Compact but Accurate Tree-Substitution Grammars , 2009, NAACL.

[64]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[65]  Shankar Kumar,et al.  A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation , 2003, NAACL.

[66]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[67]  Frederick Jelinek,et al.  Exploiting Syntactic Structure for Language Modeling , 1998, ACL.

[68]  Giorgio Satta,et al.  Generalized Multitext Grammars , 2004, ACL.

[69]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[70]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[71]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[72]  Daniel Gildea,et al.  Efficient Search for Inversion Transduction Grammar , 2006, EMNLP.

[73]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[74]  Joshua Goodman,et al.  Parsing Algorithms and Metrics , 1996, ACL.

[75]  Giorgio Satta,et al.  Independent Parallelism in Finite Copying Parallel Rewriting Systems , 1999, Theor. Comput. Sci..

[76]  Kevin Knight,et al.  A Decoder for Syntax-based Statistical MT , 2002, ACL.

[77]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[78]  Adam Lopez,et al.  Statistical machine translation , 2008, AMTA.

[79]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[80]  I. Dan Melamed,et al.  Empirical Lower Bounds on the Complexity of Translational Equivalence , 2006, ACL.

[81]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[82]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[83]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[84]  Arul Menezes,et al.  A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora , 2001, DDMMT@ACL.

[85]  Giorgio Satta,et al.  Efficient Parsing for Bilexical Context-Free Grammars and Head Automaton Grammars , 1999, ACL.

[86]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[87]  John Hale,et al.  A Probabilistic Earley Parser as a Psycholinguistic Model , 2001, NAACL.

[88]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[89]  Willem H. Zuidema Parsimonious Data-Oriented Parsing , 2007, EMNLP-CoNLL.

[90]  Dekai Wu,et al.  Machine Translation with a Stochastic Grammatical Channel , 1998, COLING-ACL.

[91]  John DeNero,et al.  Sampling Alignment Structure under a Bayesian Translation Model , 2008, EMNLP.

[92]  Rens Bod,et al.  Two Questions about Data-Oriented Parsing , 1996, VLC@COLING.

[93]  Haizhou Li,et al.  Learning Translation Boundaries for Phrase-Based Decoding , 2010, NAACL.

[94]  Peng Xu,et al.  A Study on Richer Syntactic Dependencies for Structured Language Modeling , 2002, ACL.

[95]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[96]  Dragos Stefan Munteanu,et al.  Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.

[97]  Christopher D. Manning,et al.  Quadratic-Time Dependency Parsing for Machine Translation , 2009, ACL.

[98]  Kevin Knight,et al.  Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[99]  T. Griffiths,et al.  A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[100]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[101]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[102]  Aravind K. Joshi,et al.  Tree-Adjoining Grammars , 1997, Handbook of Formal Languages.

[103]  Edward Sapir,et al.  Language: An Introduction to the Study of Speech , 1955 .

[104]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[105]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[106]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[107]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[108]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[109]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.