Training a Parser for Machine Translation Reordering

We propose a simple training regime that can improve the extrinsic performance of a parser, given only a corpus of sentences and a way to automatically evaluate the extrinsic quality of a candidate parse. We apply our method to train parsers that excel when used as part of a reordering component in a statistical machine translation system. We use a corpus of weakly-labeled reference reorderings to guide parser training. Our best parsers contribute significant improvements in subjective translation quality while their intrinsic attachment scores typically regress.

[1]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[2]  Nizar Habash Syntactic preprocessing for statistical machine translation , 2007, MTSUMMIT.

[3]  Peng Xu,et al.  Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages , 2009, NAACL.

[4]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[5]  Kevin Duh,et al.  Head Finalization: A Simple Reordering Rule for SOV Languages , 2010, WMT@ACL.

[6]  Jun'ichi Tsujii,et al.  Task-oriented Evaluation of Syntactic Parsers and Their Representations , 2008, ACL.

[7]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[8]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[9]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[10]  Dan Klein,et al.  Learning Better Monolingual Models with Unannotated Bilingual Text , 2010, CoNLL.

[11]  Ming-Wei Chang,et al.  Driving Semantic Parsing from the World’s Response , 2010, CoNLL.

[12]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[13]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[14]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[15]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[16]  Stephen Clark,et al.  A Tale of Two Parsers: Investigating and Combining Graph-based and Transition-based Dependency Parsing , 2008, EMNLP.

[17]  Slav Petrov,et al.  Uptraining for Accurate Deterministic Question Parsing , 2010, EMNLP.

[18]  Dmitriy Genzel,et al.  Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation , 2010, COLING.

[19]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[20]  Hermann Ney,et al.  Discriminative Reordering Models for Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[21]  Alon Lavie,et al.  The Meteor metric for automatic evaluation of machine translation , 2009, Machine Translation.

[22]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[23]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[24]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[25]  Ming-Wei Chang,et al.  Structured Output Learning with Indirect Supervision , 2010, ICML.

[26]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[27]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[28]  Joakim Nivre,et al.  Algorithms for Deterministic Incremental Dependency Parsing , 2008, CL.

[29]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[30]  Chao Wang,et al.  Chinese Syntactic Reordering for Statistical Machine Translation , 2007, EMNLP.

[31]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[32]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[33]  Hiroshi Ichikawa,et al.  A Lightweight Evaluation Framework for Machine Translation Reordering , 2011, WMT@EMNLP.

[34]  Jennifer Foster "cba to check the spelling": Investigating Parser Performance on Discussion Forum Posts , 2010, HLT-NAACL.

[35]  Keith B. Hall,et al.  Training dependency parsers by jointly optimizing multiple objectives , 2011, EMNLP.

[36]  Alexandra Birch,et al.  LRscore for Evaluating Lexical and Reordering Quality in MT , 2010, WMT@ACL.

[37]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[38]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[39]  Josef van Genabith,et al.  QuestionBank: Creating a Corpus of Parse-Annotated Questions , 2006, ACL.

[40]  Dan Klein,et al.  Two Languages are Better than One (for Syntactic Parsing) , 2008, EMNLP.