Train, Sort, Explain: Learning to Diagnose Translation Models

Evaluating translation models is a trade-off between effort and detail. On the one end of the spectrum there are automatic count-based methods such as BLEU, on the other end linguistic evaluations by humans, which arguably are more informative but also require a disproportionately high effort. To narrow the spectrum, we propose a general approach on how to automatically expose systematic differences between human and machine translations to human experts. Inspired by adversarial settings, we train a neural text classifier to distinguish human from machine translations. A classifier that performs and generalizes well after training should recognize systematic differences between the two classes, which we uncover with neural explainability methods. Our proof-of-concept implementation, DiaMaT, is open source. Applied to a dataset translated by a state-of-the-art neural Transformer model, DiaMaT achieves a classification accuracy of 75% and exposes meaningful differences between humans and the Transformer, amidst the current discussion about human parity.

[1]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[2]  Mark Fishel,et al.  Visualizing Neural Machine Translation Attention and Confidence , 2017, Prague Bull. Math. Linguistics.

[3]  Robert Schwarzenberg,et al.  Learning Explanations from Language Data , 2018, BlackboxNLP@EMNLP.

[4]  Tie-Yan Liu,et al.  Adversarial Neural Machine Translation , 2017, ACML.

[5]  Klaus-Robert Müller,et al.  Explaining Predictions of Non-Linear Classifiers in NLP , 2016, Rep4NLP@ACL.

[6]  Nitin Madnani,et al.  iBLEU: Interactively Debugging and Scoring Statistical Machine Translation Systems , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.

[7]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Ondrej Bojar,et al.  Automatic MT Error Analysis: Hjerson Helping Addicter , 2012, LREC.

[12]  Klaus-Robert Müller,et al.  Explaining Recurrent Neural Network Predictions in Sentiment Analysis , 2017, WASSA@EMNLP.

[13]  Eleftherios Avramidis,et al.  MT-ComparEval: Graphical evaluation interface for Machine Translation development , 2015, Prague Bull. Math. Linguistics.

[14]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[15]  Klaus-Robert Müller,et al.  Learning how to explain neural networks: PatternNet and PatternAttribution , 2017, ICLR.

[16]  Wei Chen,et al.  Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets , 2017, NAACL.

[17]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  Klaus-Robert Müller,et al.  "What is relevant in a text document?": An interpretable machine learning approach , 2016, PloS one.

[20]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[21]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[22]  Klaus-Robert Müller,et al.  iNNvestigate neural networks! , 2018, J. Mach. Learn. Res..

[23]  Philipp Koehn,et al.  Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[24]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[25]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[26]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[27]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Maja Popovic Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output , 2011, Prague Bull. Math. Linguistics.

[30]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.