GREAT: open source software for statistical machine translation

In this article, the first public release of GREAT as an open-source, statistical machine translation (SMT) software toolkit is described. GREAT is based on a bilingual language modelling approach for SMT, which is so far implemented for n-gram models based on the framework of stochastic finite-state transducers. The use of finite-state models is motivated by their simplicity, their versatility, and the fact that they present a lower computational cost, if compared with other more expressive models. Moreover, if translation is assumed to be a subsequential process, finite-state models are enough for modelling the existing relations between a source and a target language. GREAT includes some characteristics usually present in state-of-the-art SMT, such as phrase-based translation models or a log-linear framework for local features. Experimental results on a well-known corpus such as Europarl are reported in order to validate this software. A competitive translation quality is achieved, yet using both a lower number of model parameters and a lower response time than the widely-used, state-of-the-art SMT system Moses.

[1]  Francisco Casacuberta,et al.  Machine Translation with Inferred Stochastic Finite-State Transducers , 2004, Computational Linguistics.

[2]  M. Inés Torres,et al.  Joining linguistic and statistical methods for Spanish-to-Basque speech translation , 2008, Speech Commun..

[3]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[4]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[5]  E. F. Moore Sequential Machines: Selected Papers , 1964 .

[6]  Francisco Casacuberta,et al.  Finite State Language Models Smoothed Using n-Grams , 2002, Int. J. Pattern Recognit. Artif. Intell..

[7]  Yaser Al-Onaizan,et al.  Translation with Finite-State Devices , 1998, AMTA.

[8]  Francisco Casacuberta,et al.  GREAT: A Finite-State Machine Translation Toolkit Implementing a Grammatical Inference Approach for Transducer Inference (GIATI) , 2009 .

[9]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10]  Francisco Casacuberta,et al.  The EuTrans Spoken Language Translation System , 2004, Machine Translation.

[11]  Hermann Ney,et al.  Novel Reordering Approaches in Phrase-Based Statistical Machine Translation , 2005, ParallelText@ACL.

[12]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[13]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[14]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[15]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[16]  Francisco Casacuberta,et al.  Some Statistical-Estimation Methods for Stochastic Finite-State Transducers , 2004, Machine Learning.

[17]  Roland Kuhn,et al.  Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[18]  F. Casacuberta,et al.  Thot: a Toolkit To Train Phrase-based Statistical Translation Models , 2005, MTSUMMIT.

[19]  S. H. A N K A R K U M A R,et al.  A weighted finite state transducer translation template model for statistical machine translation , 2005, Natural Language Engineering.

[20]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[21]  J. Mollá Aprendizaje de transductores estocásticos de estados finitos y su aplicación en traducción automática , 2011 .

[22]  Michel Simard,et al.  Bilingual Sentence Alignment: Balancing Robustness and Accuracy , 2004, Machine Translation.

[23]  Anil Kumar Singh,et al.  Exploring Translation Similarities for Building a Better Sentence Aligner , 2007, IICAI.

[24]  M. Inés Torres,et al.  k-TSS language models in speech recognition systems , 2001, Comput. Speech Lang..

[25]  Enrique Vidal,et al.  Finite-state speech-to-speech translation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[27]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[28]  Lauri Karttunen Applications of Finite-State Transducers in Natural Language Processing , 2000, CIAA.

[29]  Jean Berstel,et al.  Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[30]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[31]  Srinivas Bangalore,et al.  Stochastic Finite-State Models for Spoken Language Machine Translation , 2000, Machine Translation.

[32]  Francisco Casacuberta,et al.  Learning Finite-State Models for Machine Translation , 2004, ICGI.

[33]  Francisco Casacuberta,et al.  ON THE STATISTICAL ESTIMATION OF STOCHASTIC FINITE-STATE TRANSDUCERS IN MACHINE TRANSLATION , 2008, Appl. Artif. Intell..

[34]  Hermann Ney,et al.  Improvements in beam search , 1994, ICSLP.

[35]  Francisco Casacuberta,et al.  Probabilistic finite-state machines - part I , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[37]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[38]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.