If beam search is the answer, what was the question?

Quite surprisingly, exact maximum a posteriori (MAP) decoding of neural language generators frequently leads to low-quality results. Rather, most state-of-the-art results on language generation tasks are attained using beam search despite its overwhelmingly high search error rate. This implies that the MAP objective alone does not express the properties we desire in text, which merits the question: if beam search is the answer, what was the question? We frame beam search as the exact solution to a different decoding objective in order to gain insights into why high probability under a model alone may not indicate adequacy. We find that beam search enforces uniform information density in text, a property motivated by cognitive science. We suggest a set of decoding objectives that explicitly enforce this property and find that exact decoding with these objectives alleviates the problems encountered when decoding poorly calibrated language generation models. Additionally, we analyze the text produced using various decoding strategies and see that, in our neural machine translation experiments, the extent to which this property is adhered to strongly correlates with BLEU.

[1]  Éva Tardos,et al.  Algorithm design , 2005 .

[2]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[3]  Christopher D. Manning,et al.  Probabilistic models of word order and syntactic discontinuity , 2005 .

[4]  Mingbo Ma,et al.  When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size) , 2017, EMNLP.

[5]  J. Christopher Beck,et al.  Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models , 2019, ICML.

[6]  John Hale,et al.  A Probabilistic Earley Parser as a Psycholinguistic Model , 2001, NAACL.

[7]  Bowen Zhou,et al.  Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation , 2016, AAAI.

[8]  Sumeet Agarwal,et al.  Uniform Information Density Effects on Syntactic Choice in Hindi , 2018 .

[9]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[10]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[11]  Xin Jiang,et al.  Neural Generative Question Answering , 2015, IJCAI.

[12]  Myle Ott,et al.  Facebook FAIR’s WMT19 News Translation Task Submission , 2019, WMT.

[13]  David Chiang,et al.  Correcting Length Bias in Neural Machine Translation , 2018, WMT.

[14]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[15]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search for Improved Description of Complex Scenes , 2018, AAAI.

[16]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[17]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[18]  Sunita Sarawagi,et al.  Length bias in Encoder Decoder Models and a Case for Global Conditioning , 2016, EMNLP.

[19]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[20]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[21]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[22]  Roger Levy,et al.  Speakers optimize information density through syntactic reduction , 2006, NIPS.

[23]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[24]  T Florian Jaeger,et al.  On language 'utility': processing complexity and communicative efficiency. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[25]  Myle Ott,et al.  Scaling Neural Machine Translation , 2018, WMT.

[26]  T. Florian Jaeger,et al.  Redundancy and reduction: Speakers manage syntactic information density , 2010, Cognitive Psychology.

[27]  Sunita Sarawagi,et al.  Calibration of Encoder Decoder Models for Neural Machine Translation , 2019, ArXiv.

[28]  Hua Wu,et al.  Improved Neural Machine Translation with SMT Features , 2016, AAAI.

[29]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[30]  Mingbo Ma,et al.  Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation , 2018, EMNLP.

[31]  Yang Liu,et al.  Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[32]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[33]  Phil Blunsom,et al.  Recurrent Convolutional Neural Networks for Discourse Compositionality , 2013, CVSM@ACL.

[34]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[35]  Bill Byrne,et al.  On NMT Search Errors and Model Errors: Cat Got Your Tongue? , 2019, EMNLP.

[36]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[37]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[38]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[39]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[40]  Bill Byrne,et al.  SGNMT – A Flexible NMT Decoding Platform for Quick Prototyping of New Models and Search Strategies , 2017, EMNLP.

[41]  Dumitru Erhan,et al.  Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[43]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[44]  Marc'Aurelio Ranzato,et al.  Analyzing Uncertainty in Neural Machine Translation , 2018, ICML.

[45]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[46]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[47]  Andreas Maletti,et al.  Recurrent Neural Networks as Weighted Language Recognizers , 2017, NAACL.

[48]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[49]  Yoshua Bengio,et al.  Montreal Neural Machine Translation Systems for WMT’15 , 2015, WMT@EMNLP.

[50]  Ryan Cotterell,et al.  Best-First Beam Search , 2020, Transactions of the Association for Computational Linguistics.

[51]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[52]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[53]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.