PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents

Neural Machine Translation (NMT) has shown drastic improvement in its quality when translating clean input, such as text from the news domain. However, existing studies suggest that NMT still struggles with certain kinds of input with considerable noise, such as User-Generated Contents (UGC) on the Internet. To make better use of NMT for cross-cultural communication, one of the most promising directions is to develop a model that correctly handles these expressions. Though its importance has been recognized, it is still not clear as to what creates the great gap in performance between the translation of clean input and that of UGC. To answer the question, we present a new dataset, PheMT, for evaluating the robustness of MT systems against specific linguistic phenomena in Japanese-English translation. Our experiments with the created dataset revealed that not only our in-house models but even widely used off-the-shelf systems are greatly disturbed by the presence of certain phenomena.

[1]  Yonatan Belinkov,et al.  Findings of the First Shared Task on Machine Translation Robustness , 2019, WMT.

[2]  Rico Sennrich,et al.  A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation , 2018, WMT.

[3]  Rico Sennrich,et al.  How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs , 2016, EACL.

[4]  Marta R. Costa-jussà,et al.  Findings of the 2019 Conference on Machine Translation (WMT19) , 2019, WMT.

[5]  Marc Dymetman,et al.  Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness , 2019, NGT@EMNLP-IJCNLP.

[6]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[7]  Hiroyuki Shindo,et al.  Japanese Text Normalization with Encoder-Decoder Model , 2016, NUT@COLING.

[8]  Marcin Junczys-Dowmunt,et al.  Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora , 2018, WMT.

[9]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[10]  Jun Suzuki,et al.  JParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus , 2020, LREC.

[11]  José A. R. Fonollosa,et al.  Character-based Neural Machine Translation , 2016, ACL.

[12]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[13]  Manabu Okumura,et al.  A Simple Approach to Unknown Word Processing in Japanese Morphological Analysis , 2013, IJCNLP.

[14]  Pierre Isabelle,et al.  A Challenge Set Approach to Evaluating Machine Translation , 2017, EMNLP.

[15]  Saito Itsumi,et al.  Morphological Analysis for Japanese Noisy Text based on Character-level and Word-level Normalization , 2014, COLING 2014.

[16]  Rico Sennrich,et al.  Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[17]  Joakim Nivre,et al.  Applying Neural Networks to English-Chinese Named Entity Transliteration , 2016, NEWS@ACM.

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  Akihiro Tamura,et al.  Neural Machine Translation Incorporating Named Entity , 2018, COLING.

[20]  Masaaki Nagata,et al.  NTT’s Machine Translation Systems for WMT19 Robustness Task , 2019, WMT.

[21]  Omer Levy,et al.  Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation , 2019, EMNLP.

[22]  Kugatsu Sadamitsu,et al.  Morphological Analysis for Japanese Noisy Text based on Character-level and Word-level Normalization , 2014, COLING.

[23]  Huda Khayrallah,et al.  Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering , 2018, WMT.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Preslav Nakov,et al.  One Size Does Not Fit All: Comparing NMT Representations of Different Granularities , 2019, NAACL.

[26]  Yaser Al-Onaizan,et al.  Evaluating Robustness to Input Perturbations for Neural Machine Translation , 2020, ACL.

[27]  Wang Ling,et al.  Character-based Neural Machine Translation , 2015, ArXiv.

[28]  Thomas Breuel,et al.  Sequence-to-sequence neural network models for transliteration , 2016, ArXiv.

[29]  David Chiang,et al.  Neural Machine Translation of Text from Non-Native Speakers , 2018, NAACL.

[30]  Zhongjun He,et al.  Robust Neural Machine Translation with Joint Textual and Phonetic Embedding , 2018, ACL.

[31]  Josef van Genabith,et al.  How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse? , 2017, AMTA.

[32]  Joel R. Tetreault,et al.  JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction , 2017, EACL.

[33]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[34]  Alexandre Berard,et al.  Naver Labs Europe's Systems for the WMT19 Machine Translation Robustness Task , 2019, WMT.

[35]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[36]  Graham Neubig,et al.  MTNT: A Testbed for Machine Translation of Noisy Text , 2018, EMNLP.

[37]  John S. White,et al.  The ARPA MT Evaluation Methodologies: Evolution, Lessons, and Future Approaches , 1994, AMTA.

[38]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[39]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.