Sampling Tree Fragments from Forests

We study the problem of sampling trees from forests, in the setting where probabilities for each tree may be a function of arbitrarily large tree fragments. This setting extends recent work for sampling to learn Tree Substitution Grammars to the case where the tree structure (TSG derived tree) is not fixed. We develop a Markov chain Monte Carlo algorithm which corrects for the bias introduced by unbalanced forests, and we present experiments using the algorithm to learn Synchronous Context-Free Grammar rules for machine translation. In this application, the forests being sampled represent the set of Hiero-style rules that are consistent with fixed input word-level alignments. We demonstrate equivalent machine translation performance to standard techniques but with much smaller grammars.

[1]  Taro Watanabe,et al.  An Unsupervised Model for Joint Phrase Alignment and Extraction , 2011, ACL.

[2]  Phil Blunsom,et al.  Blocked Inference in Bayesian Tree Substitution Grammars , 2010, ACL.

[3]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[4]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[5]  Philip Resnik,et al.  Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.

[6]  Dan Klein,et al.  Online EM for Unsupervised Models , 2009, NAACL.

[7]  Phil Blunsom,et al.  Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing , 2010, EMNLP.

[8]  Steve Renals,et al.  Power law discounting for n-gram language models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[10]  Matt Post,et al.  Bayesian Learning of a Tree Substitution Grammar , 2009, ACL.

[11]  Chris Dyer,et al.  A Gibbs Sampler for Phrasal Synchronous Grammar Induction , 2009, ACL.

[12]  Mark Hopkins,et al.  Tuning as Ranking , 2011, EMNLP.

[13]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[14]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[15]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[16]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[17]  John DeNero,et al.  Sampling Alignment Structure under a Bayesian Translation Model , 2008, EMNLP.

[18]  Phil Blunsom,et al.  Inducing Compact but Accurate Tree-Substitution Grammars , 2009, NAACL.

[19]  Daniel Gildea,et al.  Extracting Synchronous Grammar Rules From Word-Level Alignments in Linear Time , 2008, COLING.

[20]  Daniel Gildea,et al.  Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing , 2008, ACL.

[21]  Jean-Cédric Chappelier,et al.  Monte-Carlo Sampling for NP-Hard Maximization Problems in the Framework of Weighted Parsing , 2000, Natural Language Processing.

[22]  Gholamreza Haffari,et al.  Bayesian Extraction of Minimal SCFG Rules for Hierarchical Phrase-based Translation , 2011, WMT@EMNLP.

[23]  D. Knuth Estimating the efficiency of backtrack programs. , 1974 .

[24]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[25]  Mark Hopkins,et al.  SCFG Decoding Without Binarization , 2010, EMNLP.

[26]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[27]  Chris Dyer,et al.  A Bayesian Model for Learning SCFGs with Discontiguous Rules , 2012, EMNLP.

[28]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[29]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[30]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[31]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[32]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[33]  Kristina Toutanova,et al.  Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models , 2011, ACL.

[34]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[35]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[36]  Phil Blunsom,et al.  Inducing Synchronous Grammars with Slice Sampling , 2010, NAACL.

[37]  Barbara Plank,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , 2011 .