Estimating Grammar Correctness for a Priori Estimation of Machine Translation Post-Editing Effort

We present a supervised learning pilot application for estimating Machine Translation (MT) output reusability, in view of supporting a human post-editor of MT content. We train our model on typed dependencies (labeled grammar relationships) extracted from human reference and raw MT data, to then predict grammar relationship correctness values that we aggregate to provide a binary segmentlevel evaluation. In view of scaling up to larger data, we provide implemented

[1]  William W. Hager,et al.  A New Conjugate Gradient Method with Guaranteed Descent and an Efficient Line Search , 2005, SIAM J. Optim..

[2]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[4]  Ding Liu,et al.  Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[5]  Andy Way,et al.  Dependency-Based Automatic Evaluation for Machine Translation , 2007, SSST@HLT-NAACL.

[6]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[7]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[8]  Yifan He,et al.  Learning Labelled Dependencies in Machine Translation Evaluation , 2009, EAMT.

[9]  Lucia Specia,et al.  Exploiting Objective Annotations for Minimising Translation Post-editing Effort , 2011, EAMT.

[10]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[11]  Irving John Good,et al.  The Estimation of Probabilities: An Essay on Modern Bayesian Methods , 1965 .

[12]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[14]  Michel Simard,et al.  Statistical Phrase-Based Post-Editing , 2007, NAACL.

[15]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[16]  Hans P. Krings,et al.  Repairing Texts: Empirical Investigations of Machine Translation Post-Editing Processes , 2001 .

[17]  Antti Puurula,et al.  Combining Modifications to Multinomial Naive Bayes for Text Classification , 2012, AIRS.

[18]  Michael Gamon,et al.  Sentence-level MT evaluation without reference translations: beyond language modeling , 2005, EAMT.

[19]  Lucia Specia,et al.  Linguistic Features for Quality Estimation , 2012, WMT@NAACL-HLT.

[20]  Christian Hardmeier Improving Machine Translation Quality Prediction with Syntactic Tree Kernels , 2011, EAMT.

[21]  Michael Gamon,et al.  A Machine Learning Approach to the Automatic Evaluation of Machine Translation , 2001, ACL.

[22]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[23]  Frederick Jelinek,et al.  Basic Methods of Probabilistic Context Free Grammars , 1992 .

[24]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.