论文信息 - Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language Model

Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language Model

We propose an automatic evaluation method of machine translation that uses source language sentences regarded as additional pseudo references. The proposed method evaluates a translation hypothesis in a regression model. The model takes the paired source, reference, and hypothesis sentence all together as an input. A pretrained large scale cross-lingual language model encodes the input to sentence-pair vectors, and the model predicts a human evaluation score with those vectors. Our experiments show that our proposed method using Cross-lingual Language Model (XLM) trained with a translation language modeling (TLM) objective achieves a higher correlation with human judgments than a baseline method that uses only hypothesis and reference sentences. Additionally, using source sentences in our proposed method is confirmed to improve the evaluation performance.

[1] Mark Fishel,et al. bleu2vec: the Painfully Familiar Metric on Continuous Vector Space Steroids , 2017, WMT.

[2] Mamoru Komachi,et al. Machine Translation Evaluation with BERT Regressor , 2019, ArXiv.

[3] Matt J. Kusner,et al. From Word Embeddings To Document Distances , 2015, ICML.

[4] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[5] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[6] Mamoru Komachi,et al. RUSE: Regressor Using Sentence Embeddings for Automatic Machine Translation Evaluation , 2018, WMT.

[7] Ondrej Bojar,et al. Results of the WMT17 Metrics Shared Task , 2017, WMT.

[8] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[9] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[11] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[12] Daniel Marcu,et al. HyTER: Meaning-Equivalent Semantics for Translation Evaluation , 2012, NAACL.

[13] Nan Hua,et al. Universal Sentence Encoder for English , 2018, EMNLP.

[14] Honglak Lee,et al. An efficient framework for learning sentence representations , 2018, ICLR.

[15] Ying Qin,et al. Truly Exploring Multiple References for Machine Translation Evaluation , 2015, EAMT.

[16] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[17] Timothy Baldwin,et al. Accurate Evaluation of Segment-level Machine Translation Metrics , 2015, NAACL.