Advances in fully-automatic and interactive phrase-based statistical machine translation

This thesis presents different contributions in the fields of fully-automatic statistical machine translation and interactive statistical machine translation. In the field of statistical machine translation there are three problems that are to be addressed, namely, the modelling problem, the training problem and the search problem. In this thesis we present contributions regarding these three problems. Regarding the modelling problem, an alternative derivation of phrase-based statistical translation models is proposed. Such derivation introduces a set of statistical submodels governing different aspects of the translation process. In addition to this, the resulting submodels can be introduced as components of a log-linear model. Regarding the training problem, an alternative estimation technique for phrase-based models that tries to reduce the strong heuristic component of the standard estimation technique is proposed. The proposed estimation technique considers the phrase pairs that compose the phrase model as part of complete bisegmentations of the source and target sentences. We theoretically and empirically demonstrate that the proposed estimation technique can be efficiently executed. Experimental results obtained with the open-source Thot toolkit also presented in this thesis, show that the alternative estimation technique obtains phrase models with lower perplexity than those obtained by means of the standard estimation technique. However, the reduction in the perplexity of the model did not allow us to obtain improvements in the translation quality. To deal with the search problem, we propose a search algorithm which is based on the branch-and-bound search paradigm. The proposed algorithm generalises different search strategies that can be accessed by modifying the input parameters. We carried out experiments to evaluate the performance of the proposed search algorithm. Additionally, we also study an alternative formalisation of the search problem in which the best alignment at phrase-level is obtained given the source and target sentences. To solve this problem, smoothing techniques are applied over the phrase table. In addition to this, the standard search algorithm for phrase-based statistical machine translation is modified to explore the space of possible alignments. Empirical results show that the proposed techniques can be used to efficiently and robustly generate phrase-based alignments. One disadvantage of phrase-based models is its huge size when they are estimated from very large corpora. In this thesis, we propose techniques to alleviate this problem during both the estimation and the decoding stages. For this purpose, main memory requirements are transformed into hard disk requirements. Experimental results show that the hard disk accesses do not significantly decrease the efficiency of the SMT system. With respect to the contributions in the field of interactive statistical machine translation, on the one hand, we present alternative techniques to implement interactive machine translation systems. On the other hand, we give a proposal of an interactive machine translation system which is able to learn from user-feedback by means of online learning techniques. We propose two alternative techniques for interactive statistical machine translation. The first one is based on the generation of partial alignments at phrase level. This approach constitutes an application of the phrase-based alignment generation techniques that are also proposed in this thesis. The second proposal tackles the interactive machine translation process by means of word graphs and stochastic error-correction models. The proposed approach differs from other existing approaches described in the literature in the introduction of error-correction techniques in the statistical framework of the interactive machine translation process. We carried out experiments to evaluate the two proposed techniques, showing that they are competitive with state-of-the-art interactive machine translation systems. In addition to this, such techniques have been used to implement an interactive machine translation prototype following a client-server architecture. Finally, the above mentioned interactive machine translation system with online learning is based on the use of statistical models that can be incrementally updated. The main difficulty defining incremental versions of the statistical models involved in the interactive translation process appears when such models are estimated by means of the expectation-maximisation algorithm. To solve this problem, we propose the application of the incremental version of such algorithm. The proposed interactive machine translation system with online learning was empirically evaluated, demonstrating that the system is able to learn from scratch or from previously estimated models. In addition to this, the obtained results also show that the interactive machine translation system with online learning significantly outperforms other state-of-the-art systems described in the literature.

[1]  O. Cappé,et al.  On‐line expectation–maximization algorithm for latent data models , 2009 .

[2]  Jimmy J. Lin,et al.  Fast, Easy, and Cheap: Construction of Statistical Machine Translation Models with MapReduce , 2008, WMT@ACL.

[3]  Enrique Vidal,et al.  Application of OSTIA to Machine Translation Tasks , 1994, ICGI.

[4]  I. Dan Melamed,et al.  Statistical Machine Translation by Parsing , 2004, ACL.

[5]  Enrique Vidal,et al.  Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Timothy R. Anderson,et al.  The MIT-LL/AFRL IWSLT-2006 MT system , 2006, IWSLT.

[7]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[8]  Hermann Ney,et al.  Generation of Word Graphs in Statistical Machine Translation , 2002, EMNLP.

[9]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[10]  Hermann Ney,et al.  Phrase-Based Statistical Machine Translation , 2002, KI.

[11]  Stanley F. Chen,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[12]  Hermann Ney,et al.  Statistical Approaches to Computer-Assisted Translation , 2009, CL.

[13]  Alejandro Héctor Toselli,et al.  Computer Assisted Transcription of Handwritten Text Images , 2007 .

[14]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[15]  Daniel Gildea,et al.  Stochastic Lexicalized Inversion Transduction Grammar for Alignment , 2005, ACL.

[16]  Hermann Ney,et al.  Training Phrase Translation Models with Leaving-One-Out , 2010, ACL.

[17]  Ye-Yi Wang,et al.  Grammar Inference and Statistical Machine Translation , 2001 .

[18]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[19]  Bernard Vauquois La Traduction automatique à Grenoble , 1975 .

[20]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[21]  Germán Sanchis-Trilles,et al.  Learning Finite State Transducers Using Bilingual Phrases , 2008, CICLing.

[22]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[24]  R. Bellman Dynamic programming. , 1957, Science.

[25]  Hermann Ney,et al.  Some approaches to statistical and finite-state speech-to-speech translation , 2004, Comput. Speech Lang..

[26]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[27]  Andrés Marzal,et al.  A Lazy Version of Eppstein's K Shortest Paths Algorithm , 2003, WEA.

[28]  Francisco Casacuberta Nolla,et al.  Algunas soluciones al problema del escalado en traducción automática estadística , 2006 .

[29]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[30]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[31]  Jesús Tomás Gironés,et al.  Traducción automática de textos entre lenguas similares utilizando métodos estadísticos , 2003 .

[32]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[33]  U. Germann Aligned Hansards of the 36th Parliament of Canada , 2001 .

[34]  Lalit R. Bahl,et al.  Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition , 1975, IEEE Trans. Inf. Theory.

[35]  Christophe G. Giraud-Carrier,et al.  A Note on the Utility of Incremental Learning , 2000, AI Commun..

[36]  Enrique Vidal,et al.  Finite-state speech-to-speech translation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37]  Francisco Casacuberta,et al.  Multimodal interactive machine translation , 2010, ICMI-MLMI '10.

[38]  George F. Foster,et al.  Adaptive Language and Translation Models for Interactive Machine Translation , 2004, EMNLP.

[39]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[40]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[41]  Philipp Koehn,et al.  Constraining the Phrase-Based, Joint Probability Statistical Translation Model , 2006, WMT@HLT-NAACL.

[42]  Jorge Civera Saiz Novel statistical approaches to text classification, machine translation and computer-assisted translation , 2011 .

[43]  Adam Lopez,et al.  Statistical machine translation , 2008, AMTA.

[44]  Roland Kuhn,et al.  Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[45]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[46]  Kevin Knight,et al.  An Overview of Probabilistic Tree Transducers for Natural Language Processing , 2005, CICLing.

[47]  A. Land,et al.  An Automatic Method for Solving Discrete Programming Problems , 1960, 50 Years of Integer Programming.

[48]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[49]  Hermann Ney Stochastic Modelling: From Pattern Classification to Language Translation , 2001, DDMMT@ACL.

[50]  Enrique Vidal,et al.  Efficient Error-Correcting Viterbi Parsing , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Martin Anthony,et al.  Computational learning theory: an introduction , 1992 .

[52]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[53]  Franz Josef Och,et al.  Statistical machine translation: from single word models to alignment templates , 2002 .

[54]  Francisco Casacuberta,et al.  Online Learning for Interactive Statistical Machine Translation , 2010, NAACL.

[55]  Daniel Marcu,et al.  Fast Decoding and Optimal Decoding for Machine Translation , 2001, ACL.

[56]  Francisco Casacuberta,et al.  Statistical Phrase-Based Models for Interactive Computer-Assisted Translation , 2006, ACL.

[57]  Francisco Casacuberta,et al.  An Empirical Comparison of Stack-Based Decoding Algorithms for Statistical Machine Translation , 2003, IbPRIA.

[58]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[59]  Ismael García Varea Traducción automática estadística: modelos de traducción basados en máxima entropía y algoritmos de búsqueda , 2003 .

[60]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[61]  Qun Liu,et al.  Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation , 2006, ACL.

[62]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[63]  Mauro Cettolo,et al.  Efficient Handling of N-gram Language Models for Statistical Machine Translation , 2007, WMT@ACL.

[64]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[65]  Hermann Ney,et al.  Statistical multi-source translation , 2001, MTSUMMIT.

[66]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[67]  George F. Foster,et al.  User-Friendly Text Prediction For Translators , 2002, EMNLP.

[68]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[69]  Hermann Ney,et al.  An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[70]  Francisco Casacuberta,et al.  Generalized Stack Decoding Algorithms for Statistical Machine Translation , 2006, WMT@HLT-NAACL.

[71]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[72]  Francisco Casacuberta,et al.  Machine Translation with Inferred Stochastic Finite-State Transducers , 2004, Computational Linguistics.

[73]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[74]  Hiyan Alshawi,et al.  Head Automata and Bilingual Tiling: Translation with Minimal Representations , 1996, ACL.

[75]  Richard Zens,et al.  Phrase based statistical machine translation: models, search, raining , 2008 .

[76]  Kenneth Ward Church,et al.  A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams , 1991 .

[77]  Philipp Koehn,et al.  Noun phrase translation , 2003 .

[78]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[79]  Francisco Casacuberta,et al.  From Machine Translation to Computer Assisted Translation using Finite-State Models , 2004, EMNLP.

[80]  Dekai Wu,et al.  Machine Translation with a Stochastic Grammatical Channel , 1998, COLING-ACL.

[81]  Guy Lapalme,et al.  Text prediction for translators , 2002 .

[82]  John DeNero,et al.  The Complexity of Phrase Alignment Problems , 2008, ACL.

[83]  Chris Callison-Burch,et al.  Stream-based Translation Models for Statistical Machine Translation , 2010, NAACL.

[84]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[85]  Germán Sanchis-Trilles,et al.  Improving Interactive Machine Translation via Mouse Actions , 2008, EMNLP.

[86]  Francisco Casacuberta,et al.  Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures , 2010, ACL.

[87]  M. Inés Torres,et al.  Evaluation of alternatives on speech to sign language translation , 2007, INTERSPEECH.

[88]  John DeNero,et al.  Sampling Alignment Structure under a Bayesian Translation Model , 2008, EMNLP.

[89]  Ying Zhang,et al.  An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora , 2005, EAMT.

[90]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[91]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[92]  William Byrne,et al.  HMM Word and Phrase Alignment for Statistical Machine Translation , 2005, EMNLP 2005.

[93]  Srinivas Bangalore,et al.  Evaluation Metrics for Generation , 2000, INLG.

[94]  Kenji Imamura,et al.  Application of translation knowledge acquired by hierarchical phrase alignment for pattern-based MT. , 2002, TMI.

[95]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[96]  Francisco Casacuberta,et al.  The scaling problem in the pattern recognition approach to machine translation , 2008, Pattern Recognit. Lett..

[97]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[98]  Kevin Knight,et al.  Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[99]  Juan Miguel Vilar,et al.  Improve the Learning of Subsequential Transducers by Using Alignments and Dictionaries , 2000, ICGI.

[100]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[101]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[102]  Alejandro Héctor Toselli,et al.  Character-Level Interaction in Computer-Assisted Transcription of Text Images , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[103]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[104]  Hermann Ney,et al.  Algorithms for statistical translation of spoken language , 2000, IEEE Trans. Speech Audio Process..

[105]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[106]  Germán Sanchis-Trilles,et al.  Bilingual segmentation for phrasetable pruning in Statistical Machine Translation , 2011, EAMT.

[107]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[108]  Enrique Vidal,et al.  A Recursive Statistical Translation Model , 2005, ParallelText@ACL.

[109]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[110]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[111]  Christopher D. Manning,et al.  Extentions to HMM-based Statistical Word Alignment Models , 2002, EMNLP.

[112]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[113]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[114]  Francisco Casacuberta,et al.  On the use of different loss functions in statistical pattern recognition applied to machine translation , 2008, Pattern Recognit. Lett..

[115]  Francisco Casacuberta,et al.  Automatic Segmentation of Bilingual Corpora: A Comparison of Different Techniques , 2005, IbPRIA.

[116]  Enrique Vidal,et al.  Learning language models through the ECGI method , 1991, Speech Commun..

[117]  David Eppstein,et al.  Finding the k Shortest Paths , 1999, SIAM J. Comput..

[118]  John D. Lafferty,et al.  The Candide System for Machine Translation , 1994, HLT.

[119]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[120]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[121]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[122]  Francisco Casacuberta,et al.  Interactive Machine Translation , 2011 .

[123]  Francisco Casacuberta,et al.  A General Framework to Deal with the Scaling Problem in Phrase-Based Statistical Machine Translation , 2007, IbPRIA.

[124]  Philip Resnik,et al.  Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.

[125]  Hermann Ney,et al.  An Efficient A* Search Algorithm for Statistical Machine Translation , 2001, DDMMT@ACL.

[126]  Srinivas Bangalore,et al.  Learning Dependency Translation Models as Collections of Finite-State Head Transducers , 2000, Computational Linguistics.

[127]  Lalit R. Bahl,et al.  Design of a linguistic statistical decoder for the recognition of continuous speech , 1975, IEEE Trans. Inf. Theory.

[128]  Hermann Ney,et al.  Efficient Search for Interactive Statistical Machine Translation , 2003, EACL.

[129]  Alex Waibel,et al.  The CMU statistical machine translation system , 2003, MTSUMMIT.

[130]  Hiyan Alshawi Head automata for speech translation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[131]  Hemanta K. Maji,et al.  Computational Complexity of Statistical Machine Translation , 2006, EACL.

[132]  John DeNero,et al.  Why Generative Phrase Models Underperform Surface Heuristics , 2006, WMT@HLT-NAACL.

[133]  Dekai Wu,et al.  A Polynomial-Time Algorithm for Statistical Machine Translation , 1996, ACL.

[134]  Christoph Tillmann,et al.  A Projection Extension Algorithm for Statistical Machine Translation , 2003, EMNLP.

[135]  Hermann Ney,et al.  Comparison of generation strategies for interactive machine translation , 2005, EAMT.

[136]  Francisco Casacuberta,et al.  Incremental and Adaptive Learning for Interactive Machine Translation , 2011 .

[137]  Hermann Ney,et al.  A Comparison of Alignment Models for Statistical Machine Translation , 2000, COLING.

[138]  Francisco Casacuberta,et al.  Interactive Machine Translation Based on Partial Statistical Phrase-based Alignments , 2009, RANLP.

[139]  Alexander H. Waibel,et al.  Effective Phrase Translation Extraction from Alignment Models , 2003, ACL.

[140]  Pierre Isabelle,et al.  Target-Text Mediated Interactive Machine Translation , 2004, Machine Translation.

[141]  Salim Roukos,et al.  Maximum likelihood and discriminative training of direct translation models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[142]  Francisco Casacuberta,et al.  MONOTONE STATISTICAL TRANSLATION USING WORD GROUPS , 2001 .

[143]  Robert L. Mercer,et al.  Context based spelling correction , 1991, Inf. Process. Manag..

[144]  Hermann Ney,et al.  Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation , 2003, CL.

[145]  Hermann Ney,et al.  Efficient Phrase-Table Representation for Machine Translation with Applications to Online MT and Speech Translation , 2007, NAACL.

[146]  Francisco Casacuberta,et al.  Probabilistic finite-state machines - part I , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[147]  Taro Watanabe,et al.  Online Large-Margin Training for Statistical Machine Translation , 2007, EMNLP.

[148]  Francisco Casacuberta,et al.  Computer Assisted Transcription of Speech , 2007, IbPRIA.

[149]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[150]  Francisco Casacuberta,et al.  Interactive machine translation using a web-based architecture , 2010, IUI '10.

[151]  Daniel Ortiz-Martínez,et al.  A multimodal predictive-interactive application for computer assisted transcription and translation , 2009, ICMI-MLMI '09.

[152]  Hermann Ney,et al.  On the Probabilistic Interpretation of Neural Network Classifiers and Discriminative Training Criteria , 1995, IEEE Trans. Pattern Anal. Mach. Intell..