论文信息 - Human Evaluation of Multi-modal Neural Machine Translation: A Case-Study on E-Commerce Listing Titles

Human Evaluation of Multi-modal Neural Machine Translation: A Case-Study on E-Commerce Listing Titles

In this paper, we study how humans perceive the use of images as an additional knowledge source to machine-translate usergenerated product listings in an e-commerce company. We conduct a human evaluation where we assess how a multi-modal neural machine translation (NMT) model compares to two text-only approaches: a conventional state-of-the-art attention-based NMT and a phrase-based statistical machine translation (PBSMT) model. We evaluate translations obtained with different systems and also discuss the data set of user-generated product listings, which in our case comprises both product listings and associated images. We found that humans preferred translations obtained with a PBSMT system to both text-only and multi-modal NMT over 56% of the time. Nonetheless, human evaluators ranked translations from a multi-modal NMT model as better than those of a text-only NMT over 88% of the time, which suggests that images do help NMT in this use-case.

[1] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[2] Jean Oh,et al. Attention-based Multimodal Neural Machine Translation , 2016, WMT.

[3] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[4] Ulrich Germann. Proceedings of the First Conference on Machine Translation, Volume 2: Shared Task Papers , 2016 .

[5] Maja Popovic,et al. chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[6] Lucia Specia,et al. SHEF-Multimodal: Grounding Machine Translation on Images , 2016, WMT.

[7] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Joost van de Weijer,et al. Does Multimodality Help Human and Machine for Translation and Image Captioning? , 2016, WMT.

[9] Khalil Sima'an,et al. A Shared Task on Multimodal Machine Translation and Crosslingual Image Description , 2016, WMT.

[10] Andy Way,et al. Using Images to Improve Machine-Translating E-Commerce Product Listings. , 2017, EACL.

[11] Yang Liu,et al. Modeling Coverage for Neural Machine Translation , 2016, ACL.

[12] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[13] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[14] Yoshua Bengio,et al. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[15] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16] Arianna Bisazza,et al. Neural versus Phrase-Based Machine Translation Quality: a Case Study , 2016, EMNLP.

[17] Nick Campbell,et al. Doubly-Attentive Decoder for Multi-modal Neural Machine Translation , 2017, ACL.

[18] Stefan Riezler,et al. Multimodal Pivots for Image Caption Translation , 2016, ACL.

[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[21] Khalil Sima'an,et al. Multi30K: Multilingual English-German Image Descriptions , 2016, VL@ACL.

[22] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[23] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[24] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[25] Philipp Koehn,et al. Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[26] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[27] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[28] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.

[29] Desmond Elliott,et al. DCU-UvA Multimodal MT System Report , 2016, WMT.

[30] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[31] Jindřich Helcl,et al. CUNI System for WMT16 Automatic Post-Editing and Multimodal Translation Tasks , 2016, WMT.

[32] Alon Lavie,et al. Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[33] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[34] Lucia Specia,et al. Images as Context in Statistical Machine Translation , 2012 .

[35] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.