Active learning for interactive machine translation

Translation needs have greatly increased during the last years. In many situations, text to be translated constitutes an unbounded stream of data that grows continually with time. An effective approach to translate text documents is to follow an interactive-predictive paradigm in which both the system is guided by the user and the user is assisted by the system to generate error-free translations. Unfortunately, when processing such unbounded data streams even this approach requires an overwhelming amount of manpower. Is in this scenario where the use of active learning techniques is compelling. In this work, we propose different active learning techniques for interactive machine translation. Results show that for a given translation quality the use of active learning allows us to greatly reduce the human effort required to translate the sentences in the stream.

[1]  Chris Callison-Burch,et al.  Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation , 2010, ACL.

[2]  Hermann Ney,et al.  Application of word-level confidence measures in interactive statistical machine translation , 2005, EAMT.

[3]  Xiaodong Lin,et al.  Active Learning from Data Streams , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[4]  Hermann Ney,et al.  Statistical Approaches to Computer-Assisted Translation , 2009, CL.

[5]  Xiaodong Lin,et al.  Active Learning From Stream Data Using Optimal Weight Classifier Ensemble , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation , 2007, CL.

[7]  Francisco Casacuberta,et al.  An active learning scenario for interactive machine translation , 2011, ICMI '11.

[8]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[9]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[10]  George F. Foster,et al.  Adaptive Language and Translation Models for Interactive Machine Translation , 2004, EMNLP.

[11]  Elliott Macklovitch TransType2 : The Last Word , 2006, LREC.

[12]  Chris Callison-Burch,et al.  Stream-based Translation Models for Statistical Machine Translation , 2010, NAACL.

[13]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[14]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[15]  Francisco Casacuberta,et al.  Online Learning for Interactive Statistical Machine Translation , 2010, NAACL.

[16]  Jaime G. Carbonell,et al.  Active Learning and Crowd-Sourcing for Machine Translation , 2010, LREC.

[17]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[18]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[19]  Pierre Isabelle,et al.  Target-Text Mediated Interactive Machine Translation , 2004, Machine Translation.

[20]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[21]  George F. Foster,et al.  Confidence estimation for translation prediction , 2003, CoNLL.

[22]  Philippe Langlais,et al.  Trans Type: Development-Evaluation Cycles to Boost Translator's Productivity , 2002, Machine Translation.

[23]  Gholamreza Haffari,et al.  Active Learning for Statistical Phrase-based Machine Translation , 2009, NAACL.

[24]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[25]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[26]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.