Guesswork for Inference in Machine Translation with Seq2seq Model

One-shot inference is used in machine translation today. In practice, the output probability distribution is not concentrated since there might be multiple valid translations. Therefore, we propose to use a multi-shot inference mechanism in this paper. We analyze the Markovian property of sequence to sequence (seq2seq) model. Based on a large deviation principle satisfied by guesswork on Markov process, we derive theoretical upper bounds on the accuracy of the seq2seq model with single correct answer under one-shot inference and multi-shot inference. We establish analogous bounds when there are multiple correct answers in translating. We also discuss the extension of the results to translation with distortion tolerance.

[1]  Erdal Arikan An inequality on guessing and its application to sequential decoding , 1996, IEEE Trans. Inf. Theory.

[2]  Muriel Médard,et al.  Guessing with limited memory , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[3]  David Malone,et al.  Guesswork and entropy , 2004, IEEE Transactions on Information Theory.

[4]  Neri Merhav,et al.  Guessing Subject to Distortion , 1998, IEEE Trans. Inf. Theory.

[5]  Rajesh Sundaresan,et al.  Guessing and compression subject to distortion , 2010 .

[6]  Fady Alajaji,et al.  R ENYI'S ENTROPY RATE FOR DISCRETE MARKOV SOURCES , 2017 .

[7]  Ken R. Duffy,et al.  Multi-User Guesswork and Brute Force Security , 2015, IEEE Transactions on Information Theory.

[8]  Ken R. Duffy,et al.  Guesswork, Large Deviations, and Shannon Entropy , 2012, IEEE Transactions on Information Theory.

[9]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[10]  Rajesh Sundaresan,et al.  Guessing Under Source Uncertainty , 2006, IEEE Transactions on Information Theory.

[11]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[12]  Rajesh Sundaresan,et al.  Guessing Revisited: A Large Deviations Approach , 2010, IEEE Transactions on Information Theory.

[13]  Sergio Verdú,et al.  Arimoto–Rényi Conditional Entropy and Bayesian $M$ -Ary Hypothesis Testing , 2017, IEEE Transactions on Information Theory.

[14]  C. E. Pfister,et al.  Renyi entropy, guesswork moments, and large deviations , 2004, IEEE Transactions on Information Theory.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  J. Massey Guessing and entropy , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[17]  Marc'Aurelio Ranzato,et al.  Analyzing Uncertainty in Neural Machine Translation , 2018, ICML.

[18]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.