Joint Parsing and Alignment with Weakly Synchronized Grammars

Syntactic machine translation systems extract rules from bilingual, word-aligned, syntactically parsed text, but current systems for parsing and word alignment are at best cascaded and at worst totally independent of one another. This work presents a unified joint model for simultaneous parsing and word alignment. To flexibly model syntactic divergence, we develop a discriminative log-linear model over two parse trees and an ITG derivation which is encouraged but not forced to synchronize with the parses. Our model gives absolute improvements of 3.3 F1 for English parsing, 2.1 F1 for Chinese parsing, and 5.5 F1 for word alignment over each task's independent baseline, giving the best reported results for both Chinese-English word alignment and joint parsing on the parallel portion of the Chinese treebank. We also show an improvement of 1.2 BLEU in downstream MT evaluation over basic HMM alignments.

[1]  Stuart M. Shieber,et al.  Synchronous Tree-Adjoining Grammars , 1990, COLING.

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[4]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[5]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[6]  Noah A. Smith,et al.  Bilingual Parsing with Factored Estimation: Using English to Parse Korean , 2004, EMNLP.

[7]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[8]  Ben Taskar,et al.  A Discriminative Matching Approach to Word Alignment , 2005, HLT.

[9]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[10]  Stefan Riezler,et al.  On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[11]  Alexander H. Waibel,et al.  The CMU-UKA syntax augmented machine translation system for IWSLT-06 , 2006, IWSLT.

[12]  David A. Smith,et al.  Quasi-Synchronous Grammars: Alignment by Soft Projection of Syntactic Dependencies , 2006, WMT@HLT-NAACL.

[13]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[14]  John DeNero,et al.  Tailoring Word Alignments to Syntactic Machine Translation , 2007, ACL.

[15]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[16]  Kevin Knight,et al.  Syntactic Re-Alignment Models for Machine Translation , 2007, EMNLP.

[17]  Sanjeev Khudanpur,et al.  A Scalable Decoder for Parsing-Based Machine Translation with Equivalent Language Model State Maintenance , 2008, SSST@ACL.

[18]  Kevin Knight,et al.  Using Syntax to Improve Word Alignment Precision for Syntax-Based Machine Translation , 2008, WMT@ACL.

[19]  Dan Klein,et al.  Two Languages are Better than One (for Syntactic Parsing) , 2008, EMNLP.

[20]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[21]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[22]  David A. Smith,et al.  Parser Adaptation and Projection with Quasi-Synchronous Grammar Features , 2009, EMNLP.

[23]  John DeNero,et al.  Better Word Alignments with Supervised ITG Models , 2009, ACL.

[24]  Regina Barzilay,et al.  Unsupervised Multilingual Grammar Induction , 2009, ACL.