Hierarchical Back-off Modeling of Hiero Grammar based on Non-parametric Bayesian Model

In hierarchical phrase-based machine translation, a rule table is automatically learned by heuristically extracting synchronous rules from a parallel corpus. As a result, spuriously many rules are extracted which may be composed of various incorrect rules. The larger rule table incurs more run time for decoding and may result in lower translation quality. To resolve the problems, we propose a hierarchical back-off model for Hiero grammar, an instance of a synchronous context free grammar (SCFG), on the basis of the hierarchical Pitman-Yor process. The model can extract a compact rule and phrase table without resorting to any heuristics by hierarchically backing off to smaller phrases under SCFG. Inference is efficiently carried out using two-step synchronous parsing of Xiao et al., (2012) combined with slice sampling. In our experiments, the proposed model achieved higher or at least comparable translation quality against a previous Bayesian model on various language pairs; German/French/Spanish/JapaneseEnglish. When compared against heuristic models, our model achieved comparable translation quality on a full size GermanEnglish language pair in Europarl v7 corpus with significantly smaller grammar size; less than 10% of that for heuristic model.

[1]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[2]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[3]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[4]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[5]  Chris Dyer,et al.  A Gibbs Sampler for Phrasal Synchronous Grammar Induction , 2009, ACL.

[6]  John DeNero,et al.  Sampling Alignment Structure under a Bayesian Translation Model , 2008, EMNLP.

[7]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[8]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[9]  Taro Watanabe,et al.  Hierarchical Back-off Modeling of Hiero Grammar based on Non-parametric Bayesian Model , 2015, EMNLP 2015.

[10]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[11]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[12]  Xiaochang Peng,et al.  Type-based MCMC for Sampling Tree Fragments from Forests , 2014, EMNLP.

[13]  Kai Zhao,et al.  Minibatch and Parallelization for Online Large Margin Structured Learning , 2013, NAACL.

[14]  Phil Blunsom,et al.  Inducing Synchronous Grammars with Slice Sampling , 2010, NAACL.

[15]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[16]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[17]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[18]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[19]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[20]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[22]  Yee Whye Teh,et al.  Beam sampling for the infinite hidden Markov model , 2008, ICML '08.

[23]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[24]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[25]  Gholamreza Haffari,et al.  An Infinite Hierarchical Bayesian Model of Phrasal Translation , 2013, ACL.

[26]  Joel D. Martin,et al.  Improving Translation Quality by Discarding Most of the Phrasetable , 2007, EMNLP.

[27]  David Chiang,et al.  Hope and Fear for Discriminative Training of Statistical Translation Models , 2012, J. Mach. Learn. Res..

[28]  Chris Dyer,et al.  A Bayesian Model for Learning SCFGs with Discontiguous Rules , 2012, EMNLP.

[29]  John DeNero,et al.  Why Generative Phrase Models Underperform Surface Heuristics , 2006, WMT@HLT-NAACL.

[30]  Wang Ling,et al.  Entropy-based Pruning for Phrase-based Machine Translation , 2012, EMNLP.

[31]  Taro Watanabe,et al.  An Unsupervised Model for Joint Phrase Alignment and Extraction , 2011, ACL.

[32]  Eiichiro Sumita,et al.  Overview of the Patent Machine Translation Task at the NTCIR-10 Workshop , 2011, NTCIR.

[33]  Yang Liu,et al.  Unsupervised Discriminative Induction of Synchronous Grammar for Machine Translation , 2012, COLING.

[34]  Mark Hopkins,et al.  Tuning as Ranking , 2011, EMNLP.

[35]  Peng Xu,et al.  A Systematic Comparison of Phrase Table Pruning Techniques , 2012, EMNLP.

[36]  Daniel Gildea,et al.  Sampling Tree Fragments from Forests , 2014, Computational Linguistics.

[37]  Bowen Zhou,et al.  Two Methods for Extending Hierarchical Rules from the Bilingual Chart Parsing , 2010, COLING.