Machine Translation Quality Estimation Across Domains

Machine Translation (MT) Quality Estimation (QE) aims to automatically measure the quality of MT system output without reference translations. In spite of the progress achieved in recent years, current MT QE systems are not capable of dealing with data coming from different train/test distributions or domains, and scenarios in which training data is scarce. We investigate different multitask learning methods that can cope with such limitations and show that they overcome current state-of-the-art methods in real-world conditions where training and test data come from different domains.

[1]  Lucia Specia,et al.  SHEF-Lite: When Less is More for Translation Quality Estimation , 2013, WMT@ACL.

[2]  Lucia Specia,et al.  Topic models for translation quality estimation for gisting purposes , 2013 .

[3]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[4]  SpeciaLucia,et al.  Machine translation evaluation versus quality estimation , 2010 .

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  Anil Kumar Singh,et al.  Design and Analysis of a Large Corpus of Post-Edited Translations: Quality Estimation, Failure Analysis and the Variability of Post-Edition , 2013, MTSUMMIT.

[7]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[8]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[9]  Lucia Specia,et al.  Exploiting Objective Annotations for Minimising Translation Post-editing Effort , 2011, EAMT.

[10]  Marcello Federico,et al.  Match without a Referee: Evaluating MT Adequacy without Reference Translations , 2012, WMT@NAACL-HLT.

[11]  Radu Soricut,et al.  The SDL Language Weaver Systems in the WMT12 Quality Estimation Shared Task , 2012, WMT@NAACL-HLT.

[12]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  Jieping Ye,et al.  Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks , 2010, TKDD.

[15]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[16]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[17]  Christian Buck Black Box Features for the WMT 2012 Quality Estimation Shared Task , 2012, WMT@NAACL-HLT.

[18]  Nello Cristianini,et al.  Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[19]  Marcello Federico,et al.  Coping with the Subjectivity of Human Judgements in MT Quality Estimation , 2013, WMT@ACL.

[20]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[21]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[22]  Jing Jiang,et al.  Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction , 2009, ACL.

[23]  Lucia Specia,et al.  Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation , 2013, ACL.

[24]  José Guilherme Camargo de Souza,et al.  FBK-UEdin Participation to the WMT13 Quality Estimation Shared Task , 2013, WMT@ACL.

[25]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[26]  Jörg Tiedemann,et al.  Tree Kernels for Machine Translation Quality Estimation , 2012, WMT@NAACL-HLT.

[27]  Jiayu Zhou,et al.  Integrating low-rank and group-sparse structures for robust multi-task learning , 2011, KDD.

[28]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[29]  Radu Soricut,et al.  TrustRank: Inducing Trust in Automatic Translations via Ranking , 2010, ACL.

[30]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[31]  Lucia Specia,et al.  Machine translation evaluation versus quality estimation , 2010, Machine Translation.

[32]  Matteo Negri,et al.  Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks , 2013, ACL.

[33]  Lucia Specia,et al.  Linguistic Features for Quality Estimation , 2012, WMT@NAACL-HLT.

[34]  Turchi Marco,et al.  Relevance Ranking for Translated Texts , 2012 .

[35]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[36]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[37]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[38]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[39]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[40]  José Guilherme Camargo de Souza,et al.  Adaptive Quality Estimation for Machine Translation , 2014, ACL.

[41]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[42]  Matteo Negri,et al.  FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task , 2014, WMT@ACL.

[43]  Joachim Wagner,et al.  DCU-Symantec Submission for the WMT 2012 Quality Estimation Task , 2012, WMT@NAACL-HLT.

[44]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .