论文信息 - GREAT: open source software for statistical machine translation

GREAT: open source software for statistical machine translation

In this article, the first public release of GREAT as an open-source, statistical machine translation (SMT) software toolkit is described. GREAT is based on a bilingual language modelling approach for SMT, which is so far implemented for n-gram models based on the framework of stochastic finite-state transducers. The use of finite-state models is motivated by their simplicity, their versatility, and the fact that they present a lower computational cost, if compared with other more expressive models. Moreover, if translation is assumed to be a subsequential process, finite-state models are enough for modelling the existing relations between a source and a target language. GREAT includes some characteristics usually present in state-of-the-art SMT, such as phrase-based translation models or a log-linear framework for local features. Experimental results on a well-known corpus such as Europarl are reported in order to validate this software. A competitive translation quality is achieved, yet using both a lower number of model parameters and a lower response time than the widely-used, state-of-the-art SMT system Moses.

Francisco Casacuberta | Jorge González | F. Casacuberta | Jorge González

[1] Francisco Casacuberta,et al. Machine Translation with Inferred Stochastic Finite-State Transducers , 2004, Computational Linguistics.

[2] M. Inés Torres,et al. Joining linguistic and statistical methods for Spanish-to-Basque speech translation , 2008, Speech Commun..

[3] Ronald Rosenfeld,et al. A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[4] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[5] E. F. Moore. Sequential Machines: Selected Papers , 1964 .

[6] Francisco Casacuberta,et al. Finite State Language Models Smoothed Using n-Grams , 2002, Int. J. Pattern Recognit. Artif. Intell..

[7] Yaser Al-Onaizan,et al. Translation with Finite-State Devices , 1998, AMTA.

[8] Francisco Casacuberta,et al. GREAT: A Finite-State Machine Translation Toolkit Implementing a Grammatical Inference Approach for Transducer Inference (GIATI) , 2009 .

[9] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10] Francisco Casacuberta,et al. The EuTrans Spoken Language Translation System , 2004, Machine Translation.

[11] Hermann Ney,et al. Novel Reordering Approaches in Phrase-Based Statistical Machine Translation , 2005, ParallelText@ACL.

[12] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[13] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[14] Philipp Koehn,et al. Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[15] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[16] Francisco Casacuberta,et al. Some Statistical-Estimation Methods for Stochastic Finite-State Transducers , 2004, Machine Learning.

[17] Roland Kuhn,et al. Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[18] F. Casacuberta,et al. Thot: a Toolkit To Train Phrase-based Statistical Translation Models , 2005, MTSUMMIT.

[19] S. H. A N K A R K U M A R,et al. A weighted finite state transducer translation template model for statistical machine translation , 2005, Natural Language Engineering.

[20] Daniel Marcu,et al. A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[21] J. Mollá. Aprendizaje de transductores estocásticos de estados finitos y su aplicación en traducción automática , 2011 .

[22] Michel Simard,et al. Bilingual Sentence Alignment: Balancing Robustness and Accuracy , 2004, Machine Translation.

[23] Anil Kumar Singh,et al. Exploring Translation Similarities for Building a Better Sentence Aligner , 2007, IICAI.

[24] M. Inés Torres,et al. k-TSS language models in speech recognition systems , 2001, Comput. Speech Lang..

[25] Enrique Vidal,et al. Finite-state speech-to-speech translation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26] Philipp Koehn,et al. Factored Translation Models , 2007, EMNLP.

[27] Philip Koehn,et al. Statistical Machine Translation , 2010, EAMT.

[28] Lauri Karttunen. Applications of Finite-State Transducers in Natural Language Processing , 2000, CIAA.

[29] Jean Berstel,et al. Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[30] Hermann Ney,et al. Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[31] Srinivas Bangalore,et al. Stochastic Finite-State Models for Spoken Language Machine Translation , 2000, Machine Translation.

[32] Francisco Casacuberta,et al. Learning Finite-State Models for Machine Translation , 2004, ICGI.

[33] Francisco Casacuberta,et al. ON THE STATISTICAL ESTIMATION OF STOCHASTIC FINITE-STATE TRANSDUCERS IN MACHINE TRANSLATION , 2008, Appl. Artif. Intell..

[34] Hermann Ney,et al. Improvements in beam search , 1994, ICSLP.

[35] Francisco Casacuberta,et al. Probabilistic finite-state machines - part I , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36] José B. Mariño,et al. N-gram-based Machine Translation , 2006, CL.

[37] Fernando Pereira,et al. Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[38] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.