Latent-Variable Synchronous CFGs for Hierarchical Translation

Data-driven refinement of non-terminal categories has been demonstrated to be a reliable technique for improving monolingual parsing with PCFGs. In this paper, we extend these techniques to learn latent refinements of single-category synchronous grammars, so as to improve translation performance. We compare two estimators for this latent-variable model: one based on EM and the other is a spectral algorithm based on the method of moments. We evaluate their performance on a Chinese–English translation task. The results indicate that we can achieve significant gains over the baseline with both approaches, but in particular the momentsbased estimator is both faster and performs better than EM.

[1]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[2]  Bowen Zhou,et al.  Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions , 2010, EMNLP.

[3]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[4]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[5]  Phil Blunsom,et al.  Bayesian Synchronous Grammar Induction , 2008, NIPS.

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[8]  Shay B. Cohen,et al.  Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs , 2012, NIPS.

[9]  Liang Huang,et al.  Statistical Syntax-Directed Translation with Extended Domain of Locality , 2006, AMTA.

[10]  Michael Paul,et al.  Overview of the IWSLT 2009 evaluation campaign , 2009, IWSLT.

[11]  Ashish Vaswani,et al.  Rule Markov Models for Fast Tree-to-String Translation , 2011, ACL.

[12]  Takeaki Uno,et al.  Fast Algorithms to Enumerate All Common Intervals of Two Permutations , 1997, Algorithmica.

[13]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[14]  Noah A. Smith,et al.  Preference Grammars: Softening Syntactic Constraints to Improve Statistical Machine Translation , 2009, NAACL.

[15]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[16]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17]  Dan Klein,et al.  Parsing and Hypergraphs , 2001, IWPT.

[18]  Khalil Sima'an,et al.  Learning Hierarchical Translation Structure with Linguistic Annotations , 2011, ACL.

[19]  David Chiang,et al.  Hope and Fear for Discriminative Training of Statistical Translation Models , 2012, J. Mach. Learn. Res..

[20]  Kevin Knight,et al.  Training Tree Transducers , 2004, NAACL.

[21]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[22]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[23]  Nadir Durrani,et al.  A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[24]  Ying Zhang,et al.  Interpreting BLEU/NIST Scores: How Much Improvement do We Need to Have a Better System? , 2004, LREC.

[25]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[26]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[27]  Karl Stratos,et al.  Experiments with Spectral Learning of Latent-Variable PCFGs , 2013, HLT-NAACL.

[28]  Karl Stratos,et al.  Spectral learning of latent-variable PCFGs: algorithms and sample complexity , 2014, J. Mach. Learn. Res..

[29]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[30]  Alon Lavie,et al.  Improving Syntax-Augmented Machine Translation by Coarsening the Label Set , 2013, NAACL.

[31]  Yang Feng,et al.  A Markov Model of Machine Translation using Non-parametric Bayesian Inference , 2013, ACL.

[32]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[33]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[34]  Chris Dyer,et al.  A Bayesian Model for Learning SCFGs with Discontiguous Rules , 2012, EMNLP.

[35]  Daniel Gildea,et al.  Extracting Synchronous Grammar Rules From Word-Level Alignments in Linear Time , 2008, COLING.

[36]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[37]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.