Integrating Source-Language Context into Log-Linear Models of Statistical Machine Translation

The translation features typically used in state-of-the-art statistical machine translation (SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear phrase-based SMT (PB-SMT) and hierarchical PB-SMT (HPB-SMT), and can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this thesis we present novel approaches to incorporate source-language contextual modelling into the state-of-the-art SMT models in order to enhance the quality of lexical selection. We investigate the effectiveness of use of a range of contextual features, including lexical features of neighbouring words, part-of-speech tags, supertags, sentence-similarity features, dependency information, and semantic roles. We explored a series of language pairs featuring typologically different languages, and examined the scalability of our research to larger amounts of training data. While our results are mixed across feature selections, language pairs, and learning curves, we observe that including contextual features of the source sentence in general produces improvements. The most significant improvements involve the integration of long-distance contextual features, such as dependency relations in combination with part-of-speech tags in Dutch-to-English subtitle translation, the combination of dependency parse and semantic role information in English-to-Dutch parliamentary debate translation, supertag features in English-to-Chinese translation, or combination of supertag and lexical features in English-to-Dutch subtitle translation. Furthermore, we investigate the applicability of our lexical contextual model in another closely related NLP problem, namely machine transliteration.

[1]  Hermann Ney,et al.  Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models , 2009, EMNLP.

[2]  Yanjun Ma,et al.  Using Supertags as Source Language Context in SMT , 2009, EAMT.

[3]  Srinivas Bangalore,et al.  Automated extraction of Tree-Adjoining Grammars from treebanks , 2006, Nat. Lang. Eng..

[4]  Andy Way,et al.  Dependency Relations as Source Context in Phrase-Based SMT , 2009, PACLIC.

[5]  John Cocke,et al.  A Statistical Approach to Language Translation , 1988, COLING.

[6]  Key-Sun Choi,et al.  Automatic Transliteration and Back-transliteration by Decision Tree Learning , 2000, LREC.

[7]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[8]  A. Kumaran,et al.  A generic framework for machine transliteration , 2007, SIGIR.

[9]  Chris Pike,et al.  Scalable Purely-Discriminative Training for Word and Tree Transducers , 2006 .

[10]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[11]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[13]  Daniel Marcu,et al.  SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[14]  Aravind K. Joshi,et al.  Tree-adjoining grammars and lexicalized grammars , 1992, Tree Automata and Languages.

[15]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[16]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[17]  Walter Daelemans,et al.  A feature-relevance heuristic for indexing and compressing large case bases , 1997 .

[18]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[19]  Philip Resnik,et al.  Soft Syntactic Constraints for Hierarchical Phrased-Based Translation , 2008, ACL.

[20]  Christian Boitet,et al.  Automated Translation at Grenoble University , 1985, Comput. Linguistics.

[21]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[22]  K. Vijay-Shanker,et al.  Automated Extraction of TAGs from the Penn Treebank , 2000, IWPT.

[23]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[24]  Phil Blunsom,et al.  Discriminative Word Alignment with Conditional Random Fields , 2006, ACL.

[25]  Chris Quirk,et al.  Random Restarts in Minimum Error Rate Training for Statistical Machine Translation , 2008, COLING.

[26]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[27]  Marc Dymetman,et al.  Learning Machine Translation , 2010 .

[28]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[29]  Robert L. Mercer,et al.  A Statistical Approach to Sense Disambiguation in Machine Translation , 1991, HLT.

[30]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[31]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[32]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[33]  Roland Kuhn,et al.  Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[34]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[35]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[36]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[37]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[38]  Alexander M. Fraser,et al.  Getting the Structure Right for Word Alignment: LEAF , 2007, EMNLP-CoNLL.

[39]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[40]  Philippe Langlais,et al.  Translating Unknown Words by Analogical Learning , 2007, EMNLP.

[41]  William J. Byrne,et al.  Context-Dependent Alignment Models for Statistical Machine Translation , 2009, NAACL.

[42]  Haizhou Li,et al.  Whitepaper of NEWS 2009 Machine Transliteration Shared Task , 2009, NEWS@IJCNLP.

[43]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[44]  Mark Steedman,et al.  Acquiring Compact Lexicalized Grammars from a Cleaner Treebank , 2002, LREC.

[45]  Noah A. Smith,et al.  Feature-Rich Translation by Quasi-Synchronous Lattice Parsing , 2009, EMNLP.

[46]  Robert C. Moore A Discriminative Framework for Bilingual Word Alignment , 2005, HLT.

[47]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[48]  Julia Hockenmaier,et al.  Data and models for statistical parsing with combinatory categorial grammar , 2003 .

[49]  Salim Roukos,et al.  Direct Translation Model 2 , 2007, HLT-NAACL.

[50]  Hermann Ney,et al.  A DP based Search Using Monotone Alignments in Statistical Translation , 1997, ACL.

[51]  Andy Way,et al.  MaTrEx: The DCU MT System for WMT 2008 , 2008, WMT@ACL.

[52]  William Byrne,et al.  HMM Word and Phrase Alignment for Statistical Machine Translation , 2005, EMNLP 2005.

[53]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[54]  David Matthews,et al.  Machine Transliteration of Proper Names , 2007 .

[55]  Harold L. Somers,et al.  Review Article: Example-based Machine Translation , 1999, Machine Translation.

[56]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[57]  Hermann Ney,et al.  Improving Alignment Quality in Statistical Machine Translation Using Context-dependent Maximum Entropy Models , 2002, COLING.

[58]  Joakim Nivre,et al.  Dependency Grammar and Dependency Parsing , 2005 .

[59]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[60]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[61]  Andy Way,et al.  SYNTACTIC PHRASE-BASED STATISTICAL MACHINE TRANSLATION , 2006, 2006 IEEE Spoken Language Technology Workshop.

[62]  W. N. Locke,et al.  Machine Translation of Languages: Fourteen Essays , 1955 .

[63]  Philippe Langlais Prediction Of Words In Statistical Machine Translation Using A Multilayer Perceptron , 2009 .

[64]  Michael Collins,et al.  A Discriminative Model for Tree-to-Tree Translation , 2006, EMNLP.

[65]  Kevin Knight,et al.  A Decoder for Syntax-based Statistical MT , 2002, ACL.

[66]  Andy Way,et al.  Syntactically Lexicalized Phrase-Based SMT , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[67]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[68]  James R. Curran,et al.  The Importance of Supertagging for Wide-Coverage CCG Parsing , 2004, COLING.

[69]  Xavier Carreras,et al.  Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling , 2004, CoNLL.

[70]  Lucia Specia,et al.  n-Best Reranking for the Efficient Integration of Word Sense Disambiguation and Statistical Machine Translation , 2008, CICLing.

[71]  Marine Carpuat,et al.  Word Sense Disambiguation vs. Statistical Machine Translation , 2005, ACL.

[72]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[73]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[74]  Andy Way,et al.  Integrating source-language context into phrase-based statistical machine translation , 2011, Machine Translation.

[75]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[76]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[77]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[78]  Marine Carpuat,et al.  Toward integrating word sense and entity disambiguation into statistical machine translation , 2006, IWSLT.

[79]  Haizhou Li,et al.  Learning Translation Boundaries for Phrase-Based Decoding , 2010, NAACL.

[80]  Walter Daelemans,et al.  Forgetting Exceptions is Harmful in Language Learning , 1998, Machine Learning.

[81]  Jian Su,et al.  A Joint Source-Channel Model for Machine Transliteration , 2004, ACL.

[82]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[83]  Spyridon Matsoukas,et al.  Effective Use of Linguistic and Contextual Information for Statistical Machine Translation , 2009, EMNLP.

[84]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[85]  Qun Liu,et al.  Improving Statistical Machine Translation using Lexicalized Rule Selection , 2008, COLING.

[86]  Sivaji Bandyopadhyay,et al.  A Modified Joint Source-Channel Model for Transliteration , 2006, ACL.

[87]  Ulrich Germann,et al.  Greedy Decoding for Statistical Machine Translation in Almost Linear Time , 2003, NAACL.

[88]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[89]  Andy Way,et al.  Exploiting source similarity for SMT using context-informed features , 2007, TMI.

[90]  Philip Resnik,et al.  Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.

[91]  Philippe Langlais,et al.  Explorations in using grammatical dependencies for contextual phrase translation disambiguation , 2008, EAMT.

[92]  Hermann Ney,et al.  An Efficient A* Search Algorithm for Statistical Machine Translation , 2001, DDMMT@ACL.

[93]  Hermann Ney,et al.  Triplet Lexicon Models for Statistical Machine Translation , 2008, EMNLP.

[94]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[95]  Walter Daelemans,et al.  Memory-Based Language Processing , 2009, Studies in natural language processing.

[96]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[97]  Christoph Tillmann,et al.  A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[98]  Srinivas Bangalore,et al.  Three models for discriminative machine translation using Global Lexical Selection and Sentence Reconstruction , 2007, SSST@HLT-NAACL.

[99]  Andy Way,et al.  MaTrEx: the DCU MT System for NTCIR-8 , 2010, NTCIR.

[100]  Ben Taskar,et al.  A Discriminative Matching Approach to Word Alignment , 2005, HLT.

[101]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[102]  Richard Johansson,et al.  Dependency-based Syntactic–Semantic Analysis with PropBank and NomBank , 2008, CoNLL.

[103]  Marta R. Costa-jussà,et al.  A vector-space dynamic feature for phrase-based statistical machine translation , 2010, Journal of Intelligent Information Systems.

[104]  Taro Watanabe,et al.  Online Large-Margin Training for Statistical Machine Translation , 2007, EMNLP.

[105]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[106]  Yanjun Ma,et al.  Improving Word Alignment Using Syntactic Dependencies , 2008, SSST@ACL.

[107]  Srinivas Bangalore,et al.  Statistical Machine Translation through Global Lexical Selection and Sentence Reconstruction , 2007, ACL.

[108]  Marine Carpuat,et al.  Evaluating the Word Sense Disambiguation Performance of Statistical Machine Translation , 2005, IJCNLP.

[109]  Richard Johansson,et al.  The CoNLL 2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies , 2008, CoNLL.

[110]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[111]  Hermann Ney,et al.  A Comparison of Alignment Models for Statistical Machine Translation , 2000, COLING.

[112]  John Shawe-Taylor,et al.  Kernel Regression Based Machine Translation , 2007, NAACL.

[113]  Lluís Màrquez i Villodre,et al.  Context-aware Discriminative Phrase Selection for Statistical Machine Translation , 2007, WMT@ACL.

[114]  Angie Williams,et al.  Introduction To The Colloquy , 2003 .

[115]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[116]  Pascale Fung,et al.  Can Semantic Role Labeling Improve SMT? , 2009, EAMT.

[117]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[118]  Hermann Ney,et al.  Phrase-Based Statistical Machine Translation , 2002, KI.

[119]  Hany Hassan,et al.  Lexical syntax for statistical machine translation , 2009 .

[120]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[121]  Hermann Ney,et al.  Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach , 2001, ACL.

[122]  Nizar Habash,et al.  Four Techniques for Online Handling of Out-of-Vocabulary Words in Arabic-English Statistical Machine Translation , 2008, ACL.

[123]  Andy Way,et al.  Supertags as source language context in hierarchical phrase-based SMT , 2010, AMTA 2010.

[124]  Andy Way,et al.  Sentence Similarity-Based Source Context Modelling in PBSMT , 2010, 2010 International Conference on Asian Language Processing.

[125]  Robert L. Mercer,et al.  Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[126]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[127]  Tong Zhang,et al.  A Discriminative Global Training Algorithm for Statistical MT , 2006, ACL.

[128]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[129]  Andy Way,et al.  Supertagged Phrase-Based Statistical Machine Translation , 2007, ACL.

[130]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[131]  Walter Daelemans,et al.  IGTree: Using Trees for Compression and Classification in Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[132]  Antal van den Bosch Wrapped progressive sampling search for optimizing learning algorithm parameters , 2005 .

[133]  Andy Way,et al.  English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009 , 2009, NEWS@IJCNLP.

[134]  Kevin Knight,et al.  Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[135]  Alexander M. Fraser,et al.  Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation , 2007, CL.

[136]  Hermann Ney,et al.  Improvements in Phrase-Based Statistical Machine Translation , 2004, NAACL.

[137]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[138]  Daphne Koller,et al.  Word-Sense Disambiguation for Machine Translation , 2005, HLT.

[139]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[140]  Andy Way,et al.  Recent Advances in Example-Based Machine Translation , 2004 .