A Coactive Learning View of Online Structured Prediction in Statistical Machine Translation

We present a theoretical analysis of online parameter tuning in statistical machine translation (SMT) from a coactive learning view. This perspective allows us to give regret and generalization bounds for latent perceptron algorithms that are common in SMT, but fall outside of the standard convex optimization scenario. Coactive learning also introduces the concept of weak feedback, which we apply in a proofof-concept experiment to SMT, showing that learning from feedback that consists of slight improvements over predictions leads to convergence in regret and translation error rate. This suggests that coactive learning might be a viable framework for interactive machine translation. Furthermore, we find that surrogate translations replacing references that are unreachable in the decoder search space can be interpreted as weak feedback and lead to convergence in learning, if they admit an underlying linear model.

[1]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[2]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[3]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[4]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[5]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[6]  Chris Dyer,et al.  Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT , 2012, ACL.

[7]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[8]  François Yvon,et al.  Lattice BLEU oracles in machine translation , 2013, TSLP.

[9]  Alon Lavie,et al.  Learning from Post-Editing: Online Model Adaptation for Statistical Machine Translation , 2014, EACL.

[10]  Germán Sanchis-Trilles,et al.  Online Learning of Log-Linear Weights in Interactive Machine Translation , 2012, IberSPEECH.

[11]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[12]  Xu Sun,et al.  Latent Structured Perceptrons for Large-Scale Learning with Hidden Information , 2013, IEEE Transactions on Knowledge and Data Engineering.

[13]  S. Levin,et al.  On the boundedness of an iterative procedure for solving a system of linear inequalities , 1970 .

[14]  Mauro Cettolo,et al.  Online adaptation to post-edits for phrase-based statistical machine translation , 2014, Machine Translation.

[15]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[16]  Pascual Martínez-Gómez,et al.  Online adaptation strategies for statistical machine translation in post-editing scenarios , 2012, Pattern Recognit..

[17]  Shay B. Cohen,et al.  Coactive Learning for Interactive Machine Translation , 2015, MLIS@ICML.

[18]  Eunsol Choi,et al.  Scaling Semantic Parsers with On-the-Fly Ontology Matching , 2013, EMNLP.

[19]  Tong Zhang,et al.  A Discriminative Global Training Algorithm for Statistical MT , 2006, ACL.

[20]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[21]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[22]  Hervé Blanchon,et al.  The LIG Machine Translation System for WMT 2010 , 2010, WMT@ACL.

[23]  Hervé Blanchon,et al.  Collection of a Large Database of French-English SMT Output Corrections , 2012, LREC.

[24]  Dan Roth,et al.  Learning from natural instructions , 2011, Machine Learning.

[25]  Taro Watanabe,et al.  NTT statistical machine translation for IWSLT 2006 , 2006, IWSLT.

[26]  Jeffrey Heer,et al.  The efficacy of human post-editing for language translation , 2013, CHI.

[27]  Marcello Federico,et al.  Generative and Discriminative Methods for Online Adaptation in SMT , 2013, MTSUMMIT.

[28]  Vladimir Eidelman,et al.  Optimization Strategies for Online Large-Margin Learning in Machine Translation , 2012, WMT@NAACL-HLT.

[29]  Thorsten Joachims,et al.  Online Structured Prediction via Coactive Learning , 2012, ICML.

[30]  Jeffrey Heer,et al.  Human Effort and Machine Learnability in Computer Aided Translation , 2014, EMNLP.

[31]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[32]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[33]  Andrew Gelfand,et al.  On Herding and the Perceptron Cycling Theorem , 2010, NIPS.

[34]  Avneesh Singh Saluja Machine Translation with Binary Feedback: a Large-Margin Approach , 2012, AMTA.

[35]  Dimitri P. Bertsekas,et al.  Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey , 2015, ArXiv.

[36]  Ying Zhang,et al.  Online discriminative learning for machine translation with binary-valued feedback , 2014, Machine Translation.

[37]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[38]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[39]  Philip Resnik,et al.  Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.

[40]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[41]  Alexander J. Smola,et al.  Tighter Bounds for Structured Estimation , 2008, NIPS.

[42]  Taro Watanabe,et al.  Optimized Online Rank Learning for Machine Translation , 2012, NAACL.

[43]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[44]  Noah A. Smith,et al.  Structured Ramp Loss Minimization for Machine Translation , 2012, HLT-NAACL.

[45]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[46]  David A. McAllester,et al.  Generalization bounds and consistency for latent-structural probit and ramp loss , 2011, MLSLP.

[47]  Taro Watanabe,et al.  Online Large-Margin Training for Statistical Machine Translation , 2007, EMNLP.

[48]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[49]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[50]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[51]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[52]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[53]  Haitao Mi,et al.  Max-Violation Perceptron and Forced Decoding for Scalable MT Training , 2013, EMNLP.

[54]  David Chiang,et al.  Hope and Fear for Discriminative Training of Statistical Translation Models , 2012, J. Mach. Learn. Res..