LORIA System for the WMT15 Quality Estimation Shared Task

In this paper we present the system we submitted to the WMT12 shared task on Quality Estimation. Each translated sentence is given a score between 1 and 5. The score is obtained using several numerical or boolean features calculated according to the source and target sentences. We perform a linear regression of the feature space against scores in the range [1: 5]. To this end, we use a Support Vector Machine. We experiment with two kernels: linear and radial basis function. In our submission we use the features from the shared task baseline system and our own features. This leads to 66 features. To deal with this large number of features, we propose an in-house feature selection algorithm. Our results show that a lot of information is already present in baseline features, and that our feature selection algorithm discards features which are linearly correlated.

[1]  Lucia Specia,et al.  QuEst - A translation quality estimation framework , 2013, ACL.

[2]  Kamel Smaïli,et al.  “This sentence is wrong.” Detecting errors in machine-translated sentences , 2011, Machine Translation.

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Kamel Smaïli,et al.  Cross-Lingual Semantic Similarity Measure for Comparable Articles , 2014, PolTAL.

[5]  Benjamin Lecouteux,et al.  LIG System for Word Level QE task at WMT14 , 2014, WMT@ACL.

[6]  Patrick Wambacq,et al.  Confidence scoring based on backward language models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Kamel Smaïli,et al.  Phrase-based Machine Translation based on Text Mining and Statistical Language Modeling Techniques , 2011, CICLing 2011.

[8]  Radu Soricut,et al.  The SDL Language Weaver Systems in the WMT12 Quality Estimation Shared Task , 2012, WMT@NAACL-HLT.

[9]  Susan T. Dumais,et al.  Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing , 1998 .

[10]  Kamel Smaïli,et al.  Word- and Sentence-Level Confidence Measures for Machine Translation , 2009, EAMT.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[14]  Alexandre Allauzen,et al.  LIMSI Submission for WMT'14 QE Task , 2014, WMT@ACL.

[15]  Kamel Smaïli,et al.  New Confidence Measures for Statistical Machine Translation , 2009, ICAART.

[16]  Radu Soricut,et al.  TrustRank: Inducing Trust in Automatic Translations via Ranking , 2010, ACL.

[17]  Lucia Specia,et al.  Exploring Consensus in Machine Translation for Quality Estimation , 2014, WMT@ACL.

[18]  Lucia Specia,et al.  Exploiting Objective Annotations for Minimising Translation Post-editing Effort , 2011, EAMT.

[19]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[20]  C. Uhrik,et al.  Confidence metrics based on n-gram language model backoff behaviors , 1997, EUROSPEECH.

[21]  Lucia Specia,et al.  Predicting Machine Translation Adequacy , 2011, MTSUMMIT.

[22]  Kamel Smaïli,et al.  Discovering phrases in machine translation by simulated annealing , 2008, INTERSPEECH.

[23]  Roland Kuhn,et al.  Rule-Based Translation with Statistical Phrase-Based Post-Editing , 2007, WMT@ACL.

[24]  Lucia Specia,et al.  An Investigation on the Effectiveness of Features for Translation Quality Estimation , 2013, MTSUMMIT.

[25]  Kamel Smaïli,et al.  LORIA System for the WMT13 Quality Estimation Shared Task , 2013, WMT@ACL.

[26]  Yifan He,et al.  Bridging SMT and TM with Translation Recommendation , 2010, ACL.