Identifying Useful Human Correction Feedback from an On-Line Machine Translation Service

Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine translation (SMT) systems. However, feedback provided by casual users is very noisy, and must be automatically filtered in order to identify the potentially useful cases. We present a study on automatic feedback filtering in a real weblog collected from Reverso.net. We extend and re-annotate a training corpus, define an extended set of simple features and approach the problem as a binary classification task, experimenting with linear and kernel-based classifiers and feature selection. Results on the feedback filtering task show a significant improvement over the majority class, but also a precision ceiling around 70-80%. This reflects the inherent difficulty of the problem and indicates that shallow features cannot fully capture the semantic nature of the problem. Despite the modest results on the filtering task, the classifiers are proven effective in an application-based evaluation. The incorporation of a filtered set of feedback instances selected from a larger corpus significantly improves the performance of a phrase-based SMT system, according to a set of standard evaluation metrics.

[1]  Michel Simard,et al.  Using cognates to align sentences in bilingual corpora , 1993, TMI.

[2]  Samuel Reese,et al.  FreeLing 2.1: Five Years of Open-source Language Processing Tools , 2010, LREC.

[3]  Ian Witten,et al.  Data Mining , 2000 .

[5]  K. Fernow New York , 1896, American Potato Journal.

[6]  SpeciaLucia,et al.  Machine translation evaluation versus quality estimation , 2010 .

[7]  James Mayfield,et al.  Character N-Gram Tokenization for European Language Text Retrieval , 2004, Information Retrieval.

[8]  Lluís Màrquez i Villodre,et al.  Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation , 2010, Prague Bull. Math. Linguistics.

[9]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[10]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[11]  資訊工程學系 3/3_陳英瑞教授_department of computer science , 2011 .

[12]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[13]  Bruno Pouliquen,et al.  Automatic Identification of Document Translations in Large Multilingual Document Collections , 2006, ArXiv.

[14]  Lucia Specia,et al.  Machine translation evaluation versus quality estimation , 2010, Machine Translation.

[15]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[16]  Nick Cercone,et al.  Computational Linguistics , 1986, Communications in Computer and Information Science.

[17]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[18]  José B. Mariño,et al.  The TALP-UPC phrase-based translation systems for WMT12: Morphology simplification and domain adaptation , 2012, WMT@NAACL-HLT.

[19]  Lluís Màrquez i Villodre,et al.  An Analysis (and an Annotated Corpus) of User Responses to Machine Translation Output , 2012, LREC.

[20]  Pieter Reitsma,et al.  Educational and Psychological Measurement , 2003 .

[21]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[22]  Stefan Riezler,et al.  On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[23]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[24]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[25]  Adam Lopez,et al.  Proceedings of the Seventh Workshop on Statistical Machine Translation , 2012 .

[26]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[27]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[28]  Nitin Madnani,et al.  TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate , 2009, Machine Translation.

[29]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[30]  Benno Stein,et al.  Cross-language plagiarism detection , 2011, Lang. Resour. Evaluation.

[31]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[32]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[33]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.