Triplet Lexicon Models for Statistical Machine Translation

This paper describes a lexical trigger model for statistical machine translation. We present various methods using triplets incorporating long-distance dependencies that can go beyond the local context of phrases or n-gram based language models. We evaluate the presented methods on two translation tasks in a reranking framework and compare it to the related IBM model 1. We show slightly improved translation quality in terms of BLEU and TER and address various constraints to speed up the training based on Expectation-Maximization and to lower the overall number of triplets without loss in translation performance.

[1]  H. Ney,et al.  A Comparison of Various Types of Extended Lexicon Models for Statistical Machine Translation , 2010, AMTA.

[2]  Ali Mili,et al.  Machine translation from Arabic to English and French , 1995 .

[3]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[4]  Sergei Nirenburg,et al.  The Proper Place of Men and Machines in Language Translation , 2003 .

[5]  Hermann Ney,et al.  A Flexible Architecture for CAT Applications , 2006, EAMT.

[6]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models , 2005, HLT.

[7]  Richard Zens,et al.  Phrase based statistical machine translation: models, search, raining , 2008 .

[8]  Hermann Ney,et al.  Accelerated DP based search for statistical translation , 1997, EUROSPEECH.

[9]  Bangalore Srinivas A lightweight dependency analyzer for partial parsing , 2000 .

[10]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[11]  Hermann Ney,et al.  Reranking Translation Hypotheses Using Structural Properties , 2006, Learning Structured Information@EACL.

[12]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[13]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[14]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[15]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[16]  Francisco Casacuberta,et al.  From Machine Translation to Computer Assisted Translation using Finite-State Models , 2004, EMNLP.

[17]  Hermann Ney,et al.  Selection criteria for word trigger pairs in language modelling , 1996, ICGI.

[18]  Marine Carpuat,et al.  Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation , 2008, LREC.

[19]  Daniel Jurafsky,et al.  Towards better integration of semantic predictors in statistical language modeling , 1998, ICSLP.

[20]  Hermann Ney,et al.  Investigations on joint-multigram models for grapheme-to-phoneme conversion , 2002, INTERSPEECH.

[21]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[22]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[23]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[25]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[26]  Hermann Ney,et al.  Morpho-syntactic Arabic Preprocessing for Arabic to English Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[27]  Alexander M. Fraser,et al.  Syntax for Statistical Machine Translation , 2003 .

[28]  Hermann Ney,et al.  Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models , 2010, WMT@ACL.

[29]  Hermann Ney,et al.  Are Very Large N-Best Lists Useful for SMT? , 2007, HLT-NAACL.

[30]  Ying Zhang,et al.  Distributed Language Modeling for N-best List Re-ranking , 2006, EMNLP.

[31]  Hermann Ney,et al.  Comparison of alignment templates and maximum entropy models for natural language understanding , 2003 .

[32]  XTAG Research Group,et al.  A Lexicalized Tree Adjoining Grammar for English , 1998, ArXiv.

[33]  Hermann Ney,et al.  Efficient Phrase-Table Representation for Machine Translation with Applications to Online MT and Speech Translation , 2007, NAACL.

[34]  Hermann Ney,et al.  N-Gram Posterior Probabilities for Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[35]  George F. Foster,et al.  TransType: a Computer-Aided Translation Typing System , 2000 .

[36]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[37]  Taro Watanabe,et al.  Reordering Constraints for Phrase-Based Statistical Machine Translation , 2004, COLING.

[38]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[39]  Hermann Ney,et al.  Generation of Word Graphs in Statistical Machine Translation , 2002, EMNLP.

[40]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[41]  Rong Yan,et al.  A Faster Iterative Scaling Algorithm for Conditional Exponential Model , 2003, ICML.

[42]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[43]  E. Jaynes,et al.  NOTES ON PRESENT STATUS AND FUTURE PROSPECTS , 1991 .

[44]  Christoph Tillmann,et al.  Word re-ordering and dynamic programming based search algorithm for statistical machine translation , 2002 .

[45]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[46]  Joshua Goodman,et al.  Putting it all together: language model combination , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[47]  Hermann Ney,et al.  Clustered language models based on regular expressions for SMT , 2005, EAMT.

[48]  John D. Lafferty,et al.  Inference and Estimation of a Long-Range Trigram Model , 1994, ICGI.

[49]  Hermann Ney,et al.  A Deep Learning Approach to Machine Transliteration , 2009, WMT@EACL.

[50]  Srinivas Bangalore,et al.  Statistical Machine Translation through Global Lexical Selection and Sentence Reconstruction , 2007, ACL.

[51]  Hermann Ney,et al.  Statistical Language Modeling and Word Triggers , 1996 .

[52]  Hermann Ney,et al.  Assessment of smoothing methods and complex stochastic language modeling , 1999, EUROSPEECH.

[53]  Hermann Ney,et al.  Training Phrase Translation Models with Leaving-One-Out , 2010, ACL.

[54]  Hermann Ney,et al.  Word Graphs for Statistical Machine Translation , 2005, ParallelText@ACL.

[55]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[56]  Pierre Isabelle,et al.  Target-Text Mediated Interactive Machine Translation , 2004, Machine Translation.

[57]  Frank Wessel,et al.  Word posterior probabilities for large vocabulary continuous speech recognition , 2002 .

[58]  Hermann Ney,et al.  Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach , 2001, ACL.

[59]  John Lafferty,et al.  Grammatical Trigrams: A Probabilistic Model of Link Grammar , 1992 .

[60]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[61]  Hermann Ney,et al.  Comparison of Extended Lexicon Models in Search and Rescoring for SMT , 2009, HLT-NAACL.

[62]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[63]  Hermann Ney,et al.  Improvements in dynamic programming beam search for phrase-based statistical machine translation. , 2008, IWSLT.

[64]  Xiaobo Ren,et al.  Translation Analysis and Translation Automation , 1993, TMI.

[65]  Hermann Ney,et al.  A Multi-Genre SMT System for Arabic to French , 2008, LREC.

[66]  Hermann Ney,et al.  Comparison of generation strategies for interactive machine translation , 2005, EAMT.

[67]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[68]  Jean-Cédric Chappelier,et al.  A Generalized CYK Algorithm for Parsing Stochastic CFG , 1998, TAPD.

[69]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[70]  Hermann Ney,et al.  Automatic Evaluation Measures for Statistical Machine Translation System Optimization , 2008, LREC.

[71]  Srinivas Bangalore,et al.  A lightweight dependency analyzer for partial parsing , 2000, Natural Language Engineering.

[72]  Stanley F. Chen,et al.  A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[73]  John Fry Assembling a Parallel Corpus from RSS News Feeds , 2005, MTSUMMIT.

[74]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[75]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[76]  Hermann Ney,et al.  Improvements in Phrase-Based Statistical Machine Translation , 2004, NAACL.

[77]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[78]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[79]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[80]  MARTIN KAY The Proper Place of Men and Machines in Language Translation , 2004, Machine Translation.

[81]  H. Ney,et al.  Statistical Machine Translation of European Parliamentary Speeches , 2005, MTSUMMIT.

[82]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[83]  David Eppstein,et al.  Finding the k Shortest Paths , 1999, SIAM J. Comput..

[84]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[85]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[86]  Holger Schwenk,et al.  Investigations on large-scale lightly-supervised training for statistical machine translation. , 2008, IWSLT.

[87]  Anthony J. Robinson,et al.  Language model adaptation using mixtures and an exponentially decaying cache , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[88]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[89]  Daphne Koller,et al.  Word-Sense Disambiguation for Machine Translation , 2005, HLT.

[90]  Guy Lapalme,et al.  TransType2 - An Innovative Computer-Assisted Translation System , 2004, ACL.

[91]  Hermann Ney,et al.  Word Triggers and the EM Algorithm , 1997, CoNLL.

[92]  Khalid Choukri,et al.  Evaluation of Machine Translation with Predictive Metrics beyond BLEU/NIST: CESTA Evaluation Campaign # 1 , 2005, MTSUMMIT.

[93]  Khalid Choukri,et al.  Assessing Human and Automated Quality Judgments in the French MT Evaluation Campaign CESTA , 2007 .

[94]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[95]  Spyridon Matsoukas,et al.  Effective Use of Linguistic and Contextual Information for Statistical Machine Translation , 2009, EMNLP.

[96]  Pierre Isabelle,et al.  Word Completion- A First Step Toward Target-Text Mediated IMT , 1996, COLING.

[97]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[98]  Rong Zhang,et al.  Improve latent semantic analysis based language model by integrating multiple level knowledge , 2002, INTERSPEECH.

[99]  Brian Roark,et al.  Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm , 2004, ACL.

[100]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[101]  Khalil Sima'an,et al.  Smoothing a Lexicon-based POS Tagger for Arabic and Hebrew , 2007, SEMITIC@ACL.

[102]  Hermann Ney,et al.  A Systematic Comparison of Training Criteria for Statistical Machine Translation , 2007, EMNLP-CoNLL.

[103]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[104]  Hermann Ney,et al.  Analysing soft syntax features and heuristics for hierarchical phrase based machine translation. , 2008, IWSLT.

[105]  Hermann Ney,et al.  Phrase-Based Statistical Machine Translation , 2002, KI.

[106]  Hermann Ney,et al.  Creating a Large-Scale Arabic to French Statistical MachineTranslation System , 2006, LREC.

[107]  Hermann Ney,et al.  Alignment templates: the RWTH SMT system , 2004, IWSLT.

[108]  Richard Zens,et al.  The RWTH Machine Translation System , 2006 .

[109]  William H. Press,et al.  Numerical recipes in C , 2002 .

[110]  Stephen D. Richardson Machine Translation: From Research to Real Users , 2002, Lecture Notes in Computer Science.

[111]  Daniel Gildea,et al.  Binarization of Synchronous Context-Free Grammars , 2009, CL.

[112]  Hermann Ney,et al.  The RWTH machine translation system for IWSLT 2008. , 2008, IWSLT.

[113]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[114]  Eiichiro Sumita,et al.  Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World , 2002, LREC.

[115]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[116]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[117]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[118]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .

[119]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[120]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[121]  Alan K. Melby,et al.  COMPUTER-ASSISTED TRANSLATION SYSTEMS: The Standard Design and A Multi-level Design , 1983, ANLP.

[122]  Mauro Cettolo,et al.  The ITC-irst SMT system for IWSLT 2006 , 2006, IWSLT.

[123]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[124]  Hermann Ney,et al.  Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation , 2003, CL.

[125]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[126]  Alexander M. Fraser,et al.  A Smorgasbord of Features for Statistical Machine Translation , 2004, NAACL.

[127]  Sanjeev Khudanpur,et al.  Cross-Lingual Lexical Triggers in Statistical Language Modeling , 2003, EMNLP.

[128]  Hermann Ney,et al.  Efficient Search for Interactive Statistical Machine Translation , 2003, EACL.

[129]  Dietrich Klakow,et al.  Testing the correlation of word error rate and perplexity , 2002, Speech Commun..

[130]  Haytham Alsharaf,et al.  French to Arabic machine translation: the specificity of language couples , 2004, EAMT.

[131]  Hermann Ney,et al.  Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models , 2009, EMNLP.

[132]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[133]  Hermann Ney,et al.  The RWTH statistical machine translation system for the IWSLT 2006 evaluation , 2006, IWSLT.

[134]  Hermann Ney,et al.  The RWTH Arabic-to-English spoken language translation system , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[135]  Hermann Ney,et al.  The RWTH Phrase-based Statistical Machine Translation System , 2005, IWSLT.