Statistical Machine Translation: A Guide for Linguists and Translators

This paper presents an overview of Statistical Machine Translation (SMT), which is currently the dominant approach in Machine Translation (MT) research. In Way and Hearne (2010), we describe how central linguists and translators are to the MT process, so that SMT developers and researchers may better understand how to include these groups in continuing to advance the stateof-the-art. If these constituencies are to make an impact in the field of MT, they need to know how their input is used by SMT systems. Accordingly, our objective in this paper is to present the basic principles underpinning SMT in a way that linguists and translators will find accessible and useful.

[1]  Andy Way,et al.  Hybrid data-driven models of machine translation , 2005, Machine Translation.

[2]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[3]  Andy Way,et al.  On the Role of Translations in State-of-the-Art Statistical Machine Translation , 2011, Lang. Linguistics Compass.

[4]  Kevin Knight,et al.  Automating Knowledge Acquisition for Machine Translation , 1997, AI Mag..

[5]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[6]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[7]  Kevin Knight,et al.  Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[8]  Philip Resnik,et al.  Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.

[9]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[13]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[14]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[15]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[16]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[17]  Joseph P. Turian,et al.  Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.

[18]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[19]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[20]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[21]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.