Hierarchical Back-off Modeling of Hiero Grammar based on Non-parametric Bayesian Model

In hierarchical phrase-based machine translation, a rule table is automatically learned by heuristically extracting synchronous rules from a parallel corpus. As a result, spuriously many rules are extracted which may be composed of various incorrect rules. The larger rule table incurs more run time for decoding and may result in lower translation quality. To resolve the problems, we propose a hierarchical back-off model for Hiero grammar, an instance of a synchronous context free grammar (SCFG), on the basis of the hierarchical Pitman-Yor process. The model can extract a compact rule and phrase table without resorting to any heuristics by hierarchically backing off to smaller phrases under SCFG. Inference is efficiently carried out using two-step synchronous parsing of Xiao et al., (2012) combined with slice sampling. In our experiments, the proposed model achieved higher or at least comparable translation quality against a previous Bayesian model on various language pairs; German/French/Spanish/JapaneseEnglish. When compared against heuristic models, our model achieved comparable translation quality on a full size GermanEnglish language pair in Europarl v7 corpus with significantly smaller grammar size; less than 10% of that for heuristic model.

[1]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[2]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[3]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[4]  Chris Dyer,et al.  A Gibbs Sampler for Phrasal Synchronous Grammar Induction , 2009, ACL.

[5]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[6]  Yee Whye Teh,et al.  Beam sampling for the infinite hidden Markov model , 2008, ICML '08.

[7]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[8]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[9]  Gholamreza Haffari,et al.  An Infinite Hierarchical Bayesian Model of Phrasal Translation , 2013, ACL.

[10]  Yang Liu,et al.  Unsupervised Discriminative Induction of Synchronous Grammar for Machine Translation , 2012, COLING.

[11]  Mark Hopkins,et al.  Tuning as Ranking , 2011, EMNLP.

[12]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[13]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[14]  Peng Xu,et al.  A Systematic Comparison of Phrase Table Pruning Techniques , 2012, EMNLP.

[15]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[16]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[17]  Daniel Gildea,et al.  Sampling Tree Fragments from Forests , 2014, Computational Linguistics.

[18]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[19]  Bowen Zhou,et al.  Two Methods for Extending Hierarchical Rules from the Bilingual Chart Parsing , 2010, COLING.

[20]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[21]  Taro Watanabe,et al.  An Unsupervised Model for Joint Phrase Alignment and Extraction , 2011, ACL.

[22]  John DeNero,et al.  Why Generative Phrase Models Underperform Surface Heuristics , 2006, WMT@HLT-NAACL.

[23]  Kai Zhao,et al.  Minibatch and Parallelization for Online Large Margin Structured Learning , 2013, NAACL.

[24]  Phil Blunsom,et al.  Inducing Synchronous Grammars with Slice Sampling , 2010, NAACL.

[25]  Eiichiro Sumita,et al.  Overview of the Patent Machine Translation Task at the NTCIR-10 Workshop , 2011, NTCIR.

[26]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[27]  Joel D. Martin,et al.  Improving Translation Quality by Discarding Most of the Phrasetable , 2007, EMNLP.

[28]  David Chiang,et al.  Hope and Fear for Discriminative Training of Statistical Translation Models , 2012, J. Mach. Learn. Res..

[29]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[30]  Xiaochang Peng,et al.  Type-based MCMC for Sampling Tree Fragments from Forests , 2014, EMNLP.

[31]  Chris Dyer,et al.  A Bayesian Model for Learning SCFGs with Discontiguous Rules , 2012, EMNLP.

[32]  John DeNero,et al.  Sampling Alignment Structure under a Bayesian Translation Model , 2008, EMNLP.

[33]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.