Refinements in hierarchical phrase-based translation systems

The relatively recently proposed hierarchical phrase-based translation model for statistical machine translation (SMT) has achieved state-of-the-art performance in numerous recent translation evaluations. Hierarchical phrasebased systems comprise a pipeline of modules with complex interactions. In this thesis, we propose refinements to the hierarchical phrase-based model as well as improvements and analyses in various modules for hierarchical phrase-based systems. We took the opportunity of increasing amounts of available training data for machine translation as well as existing frameworks for distributed computing in order to build better infrastructure for extraction, estimation and retrieval of hierarchical phrase-based grammars. We design and implement grammar extraction as a series of Hadoop MapReduce jobs. We store the resulting grammar using the HFile format, which offers competitive trade-offs in terms of efficiency and simplicity. We demonstrate improvements over two alternative solutions used in machine translation. The modular nature of the SMT pipeline, while allowing individual improvements, has the disadvantage that errors committed by one module are propagated to the next. This thesis alleviates this issue between the word alignment module and the grammar extraction and estimation module by considering richer statistics from word alignment models in extraction. We use alignment link and alignment phrase pair posterior probabilities for grammar extraction and estimation and demonstrate translation improvements in Chinese to English translation. This thesis also proposes refinements in grammar and language modelling both in the context of domain adaptation and in the context of the interaction between first-pass decoding and lattice rescoring. We analyse alternative strategies for grammar and language model cross-domain adaptation. We also study interactions between first-pass and second-pass language model in

[1]  장훈,et al.  [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .

[2]  Andreas Zollmann,et al.  Grammar based statistical MT on Hadoop: An end-to-end toolkit for large scale PSCFG based MT , 2009, Prague Bull. Math. Linguistics.

[3]  Wei He,et al.  Dependency Based Chinese Sentence Realization , 2009, ACL/IJCNLP.

[4]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[5]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[6]  William J. Byrne,et al.  Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices , 2010, ACL.

[7]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[8]  Suresh Venkatasubramanian,et al.  Streaming for large scale NLP: Language Modeling , 2009, NAACL.

[9]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[10]  Matt Post,et al.  Joshua 4.0: Packing, PRO, and Paraphrases , 2012, WMT@NAACL-HLT.

[11]  Marcus Tomalin,et al.  Word Ordering with Phrase-Based Grammars , 2014, EACL.

[12]  Stephan Vogel,et al.  Training Phrase-Based Machine Translation Models on the Cloud: Open Source Machine Translation Toolkit Chaski , 2010, Prague Bull. Math. Linguistics.

[13]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[14]  Philipp Koehn,et al.  Constraining the Phrase-Based, Joint Probability Statistical Translation Model , 2006, WMT@HLT-NAACL.

[15]  Dan Klein,et al.  Unsupervised Syntactic Alignment with Inversion Transduction Grammars , 2010, NAACL.

[16]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[17]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[18]  Kenneth Heafield,et al.  N-gram Counts and Language Models from the Common Crawl , 2014, LREC.

[19]  Jimmy J. Lin,et al.  Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.

[20]  Matt Post,et al.  Joshua 5.0: Sparser, Better, Faster, Server , 2013, WMT@ACL.

[21]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[22]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[23]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[24]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[25]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[26]  Hermann Ney,et al.  Generation of Word Graphs in Statistical Machine Translation , 2002, EMNLP.

[27]  Miles Osborne,et al.  Randomised Language Modelling for Statistical Machine Translation , 2007, ACL.

[28]  Mark Hopkins,et al.  Extraction Programs: A Unified Approach to Translation Rule Extraction , 2011, WMT@EMNLP.

[29]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[30]  Bowen Zhou,et al.  Enriching SCFG rules directly from efficient bilingual chart parsing , 2009, IWSLT.

[31]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[32]  William J. Byrne,et al.  HMM Word and Phrase Alignment for Statistical Machine Translation , 2005, HLT.

[33]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[34]  William J. Byrne,et al.  Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities , 2010, EMNLP.

[35]  Stephen Clark,et al.  Chinese Segmentation with a Word-Based Perceptron Algorithm , 2007, ACL.

[36]  Hermann Ney,et al.  Efficient Phrase-Table Representation for Machine Translation with Applications to Online MT and Speech Translation , 2007, NAACL.

[37]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[38]  Chris Dyer,et al.  A Gibbs Sampler for Phrasal Synchronous Grammar Induction , 2009, ACL.

[39]  Peter Deutsch,et al.  ZLIB Compressed Data Format Specification version 3.3 , 1996, RFC.

[40]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[41]  Daniel Marcu,et al.  Fast Decoding and Optimal Decoding for Machine Translation , 2001, ACL.

[42]  Spyridon Matsoukas,et al.  Discriminative Corpus Weight Estimation for Machine Translation , 2009, EMNLP.

[43]  John DeNero,et al.  The Complexity of Phrase Alignment Problems , 2008, ACL.

[44]  Gonzalo Iglesias,et al.  The HiFST system for the EuroParl Spanish-to-English task , 2009 .

[45]  Graeme W. Blackwood Lattice rescoring methods for statistical machine translation , 2010 .

[46]  Stephen Wan,et al.  Improving Grammaticality in Statistical Sentence Generation: Introducing a Dependency Spanning Tree Algorithm with an Argument Satisfaction Model , 2009, EACL.

[47]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[48]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[49]  George F. Foster,et al.  Simulating Discriminative Training for Linear Mixture Adaptation in Statistical Machine Translation , 2013, MTSUMMIT.

[50]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[51]  William Byrne,et al.  Bitext alignment for statistical machine translation , 2006 .

[52]  Shankar Kumar,et al.  Improving Word Alignment with Bridge Languages , 2007, EMNLP.

[53]  Robert Dale,et al.  Building applied natural language generation systems , 1997, Natural Language Engineering.

[54]  David Guthrie,et al.  Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval , 2010, EMNLP.

[55]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[56]  Hermann Ney,et al.  A Phrase Orientation Model for Hierarchical Machine Translation , 2013, WMT@ACL.

[57]  Thorsten Brants,et al.  Randomized Language Models via Perfect Hash Functions , 2008, ACL.

[58]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[59]  Dan Klein,et al.  Faster and Smaller N-Gram Language Models , 2011, ACL.

[60]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[61]  Yue Zhang Partial-Tree Linearization: Generalized Word Ordering for Text Synthesis , 2013, IJCAI.

[62]  Adam Lopez,et al.  Hierarchical Phrase-Based Translation with Suffix Arrays , 2007, EMNLP.

[63]  Jean-Cédric Chappelier,et al.  A Generalized CYK Algorithm for Parsing Stochastic CFG , 1998, TAPD.

[64]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[65]  Stephen Clark,et al.  Syntax-Based Word Ordering Incorporating a Large-Scale Language Model , 2012, EACL.

[66]  Xiaoyang Yu Estimating Language Models Using Hadoop and Hbase , 2008 .

[67]  William J. Byrne,et al.  The CUED HiFST System for the WMT10 Translation Shared Task , 2010, WMT@ACL.

[68]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[69]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[70]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[71]  Chris Callison-Burch,et al.  Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases , 2005, ACL.

[72]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[73]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[74]  Miles Osborne,et al.  Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap , 2007, EMNLP.

[75]  Alfred V. Aho,et al.  Syntax Directed Translations and the Pushdown Assembler , 1969, J. Comput. Syst. Sci..

[76]  Ying Zhang,et al.  An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora , 2005, EAMT.

[77]  Phil Blunsom,et al.  A Discriminative Latent Variable Model for Statistical Machine Translation , 2008, ACL.

[78]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[79]  Brian Roark,et al.  Generalized Algorithms for Constructing Statistical Language Models , 2003, ACL.

[80]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[81]  Manisha Sharma,et al.  Evaluation of machine translation , 2011, ICWET.

[82]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[83]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[84]  Shankar Kumar,et al.  Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2008, EMNLP.

[85]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[86]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[87]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[88]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[89]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[90]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[91]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[92]  Miles Osborne,et al.  Stream-based Randomised Language Models for SMT , 2009, EMNLP.

[93]  Christopher D. Manning,et al.  Accurate Non-Hierarchical Phrase-Based Translation , 2010, NAACL.

[94]  Roland Kuhn,et al.  Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[95]  Philip Resnik,et al.  Generalizing Hierarchical Phrase-based Translation using Rules with Adjacent Nonterminals , 2010, HLT-NAACL.

[96]  Philipp Koehn,et al.  Dirt Cheap Web-Scale Parallel Text from the Common Crawl , 2013, ACL.

[97]  Dekai Wu,et al.  Improving Phrase-Based Translation via Word Alignments from Stochastic Inversion Transduction Grammars , 2009, SSST@HLT-NAACL.

[98]  Noah A. Smith,et al.  Wider Pipelines: N-Best Alignments and Parses in MT Training , 2008, AMTA.

[99]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[100]  Roland Kuhn,et al.  Stabilizing Minimum Error Rate Training , 2009, WMT@EACL.

[101]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[102]  Richard Edwin Stearns,et al.  Syntax-Directed Transduction , 1966, JACM.

[103]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[104]  U. Germann Aligned Hansards of the 36th Parliament of Canada , 2001 .

[105]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[106]  Leo Wanner,et al.  The Surface Realisation Task: Recent Developments and Future Plans , 2012, INLG.

[107]  William J. Byrne,et al.  Hierarchical Phrase-Based Translation with Weighted Finite State Transducers , 2009, HLT-NAACL.

[108]  Ondrej Bojar,et al.  eppex: Epochal Phrase Table Extraction for Statistical Machine Translation , 2011, Prague Bull. Math. Linguistics.

[109]  Wolfgang Gentzsch,et al.  Sun Grid Engine: towards creating a compute power grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[110]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[111]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars, with Application to Segmentation, Bracketing, and Alignment of Parallel Corpora , 1995, IJCAI.

[112]  William J. Byrne,et al.  Hierarchical Phrase-Based Translation with Weighted Finite-State Transducers and Shallow-n Grammars , 2010, CL.

[113]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[114]  Adrià de Gispert,et al.  The University of Cambridge Russian-English System at WMT13 , 2013, WMT@ACL.

[115]  NeyHermann,et al.  A systematic comparison of various statistical alignment models , 2003 .

[116]  David Chiang,et al.  Two Easy Improvements to Lexical Weighting , 2011, ACL.

[117]  William J. Byrne,et al.  Rule Filtering by Pattern for Efficient Hierarchical Translation , 2009, EACL.

[118]  Stephen Clark,et al.  Syntax-Based Grammaticality Improvement using CCG and Guided Search , 2011, EMNLP.

[119]  Bernard Chazelle,et al.  The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[120]  Adam Lopez,et al.  Joshua 3.0: Syntax-based Machine Translation with the Thrax Grammar Extractor , 2011, WMT@EMNLP.

[121]  Hsin-Hsi Chen,et al.  Detecting Word Ordering Errors in Chinese Sentences for Learning Chinese as a Foreign Language , 2012, COLING.

[122]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[123]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[124]  Jimmy J. Lin,et al.  Fast, Easy, and Cheap: Construction of Statistical Machine Translation Models with MapReduce , 2008, WMT@ACL.

[125]  William J. Byrne,et al.  Simple and Efficient Model Filtering in Statistical Machine Translation , 2012, Prague Bull. Math. Linguistics.

[126]  Ying Zhang,et al.  Distributed Language Modeling for N-best List Re-ranking , 2006, EMNLP.

[127]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[128]  John Hutchins,et al.  Petr Petrovich Troyanskii (1894–1950): A Forgotten Pioneer of Mechanical Translation , 2000, Machine Translation.

[129]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[130]  Phil Blunsom,et al.  Bayesian Synchronous Grammar Induction , 2008, NIPS.

[131]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[132]  S. Vogel,et al.  Overlapping phrase-level translation rules in an SMT engine , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[133]  John Hutchins,et al.  From First Conception to First Demonstration: the Nascent Years of Machine Translation, 1947–1954. A Chronology , 1998, Machine Translation.

[134]  W. N. Locke,et al.  Machine Translation of Languages , 1956 .

[135]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[136]  Yang Liu,et al.  Weighted Alignment Matrices for Statistical Machine Translation , 2009, EMNLP.

[137]  Hermann Ney,et al.  Symmetric Word Alignments for Statistical Machine Translation , 2004, COLING.