Metric for Automatic Machine Translation Evaluation based on Universal Sentence Representations

Sentence representations can capture a wide range of information that cannot be captured by local features based on character or word N-grams. This paper examines the usefulness of universal sentence representations for evaluating the quality of machine translation. Although it is difficult to train sentence representations using small-scale translation datasets with manual evaluation, sentence representations trained from large-scale data in other tasks can improve the automatic evaluation of machine translation. Experimental results of the WMT-2016 dataset show that the proposed method achieves state-of-the-art performance with sentence representation features only.

[1]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[2]  Lluís Màrquez i Villodre,et al.  Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation , 2010, Prague Bull. Math. Linguistics.

[3]  Christopher Joseph Pal,et al.  Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[4]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[5]  Ondrej Bojar,et al.  Results of the WMT13 Metrics Shared Task , 2015, WMT@EMNLP.

[6]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[7]  Chi-kiu Lo,et al.  MEANT 2.0: Accurate semantic MT evaluation for any output language , 2017, WMT.

[8]  Josef van Genabith,et al.  ReVal: A Simple and Effective Machine Translation Evaluation Metric Based on Recurrent Neural Networks , 2015, EMNLP.

[9]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[10]  LiuQun,et al.  Machine Translation Evaluation Metric Based on Dependency Parsing Model , 2019 .

[11]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[12]  Maja Popovic,et al.  chrF++: words helping character n-grams , 2017, WMT.

[13]  Ondrej Bojar,et al.  Results of the WMT17 Metrics Shared Task , 2017, WMT.

[14]  Qun Liu,et al.  CASICT-DCU Participation in WMT2015 Metrics Task , 2015, WMT@EMNLP.

[15]  Qun Liu,et al.  Blend: a Novel Combined MT Metric Based on Direct Assessment — CASICT-DCU submission to WMT17 Metrics Task , 2017, WMT.

[16]  Hermann Ney,et al.  CharacTer: Translation Edit Rate on Character Level , 2016, WMT.

[17]  Josef van Genabith,et al.  Machine Translation Evaluation using Recurrent Neural Networks , 2015, WMT@EMNLP.

[18]  Khalil Sima'an,et al.  BEER 1.1: ILLC UvA submission to metrics and tuning task , 2015, WMT@EMNLP.

[19]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[20]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Kevin Gimpel,et al.  Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations , 2017, ArXiv.