Large Scale Translation Quality Estimation

This study explores methods for developing a large scale Quality Estimation framework for Machine Translation. We expand existing resources for Quality Estimation across related languages by using different transfer learning methods. The transfer learning methods are: Transductive SVM, Label Propagation and Self-taught Learning. We use transfer learning methods on the available labelled datasets, e.g. en-es, to produce a range of Quality Estimation models for Romance languages, while also adapting for subtitling as a new domain. The Self-taught Learning method shows the most promising results among the used techniques.

[1]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[2]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[3]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[4]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[5]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[6]  Philipp Koehn,et al.  Predicting Success in Machine Translation , 2008, EMNLP.

[7]  Nello Cristianini,et al.  Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[8]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[10]  Lucia Specia,et al.  QuEst - A translation quality estimation framework , 2013, ACL.

[11]  Matteo Negri,et al.  FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task , 2014, WMT@ACL.

[12]  Marco Turchi,et al.  Automatic Annotation of Machine Translation Datasets with Binary Quality Judgements , 2014, LREC.

[13]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[14]  Andy Way,et al.  Referential Translation Machines for Predicting Translation Quality , 2014, WMT@ACL.

[15]  Lucia Specia,et al.  SHEF-Lite 2.0: Sparse Multi-task Gaussian Processes for Translation Quality Estimation , 2014, WMT@ACL.

[16]  Eleftherios Avramidis,et al.  Efforts on Machine Learning over Human-mediated Translation Edit Rate , 2014, WMT@ACL.

[17]  Yi Yang,et al.  Unsupervised Multi-Domain Adaptation with Feature Embeddings , 2015, NAACL.

[18]  Elisa Ricci,et al.  Online Multitask Learning for Machine Translation Quality Estimation , 2015, ACL.