Leveraging Sentence Similarity in Natural Language Generation: Improving Beam Search using Range Voting

We propose a method for natural language generation, choosing the most representative output rather than the most likely output. By viewing the language generation process from the voting theory perspective, we define representativeness using range voting and a similarity measure. The proposed method can be applied when generating from any probabilistic language model, including n-gram models and neural network models. We evaluate different similarity measures on an image captioning task and a machine translation task, and show that our method generates longer and more diverse sentences, providing a solution to the common problem of short outputs being preferred over longer and more informative ones. The generated sentences obtain higher BLEU scores, particularly when the beam size is large. We also perform a human evaluation on both tasks and find that the outputs generated using our method are rated higher.

[1]  John D. Emerson,et al.  Another Look at the Sign Test When Ties are Present: The Problem of Confidence Intervals , 1979 .

[2]  Juan José Rodríguez Diez,et al.  A weighted voting framework for classifiers ensembles , 2012, Knowledge and Information Systems.

[3]  Stephen Clark,et al.  Latent Variable Dialogue Models and their Diversity , 2017, EACL.

[4]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[7]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search for Improved Description of Complex Scenes , 2018, AAAI.

[8]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[9]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[10]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[11]  Sara Stymne,et al.  Improving Alignment for SMT by Reordering and Augmenting the Training Corpus , 2009, WMT@EACL.

[12]  Markus Freitag,et al.  Beam Search Strategies for Neural Machine Translation , 2017, NMT@ACL.

[13]  F. Blain,et al.  Exploring Hypotheses Spaces in Neural Machine Translation , 2017, MTSUMMIT.

[14]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[15]  Samy Bengio,et al.  Tensor2Tensor for Neural Machine Translation , 2018, AMTA.

[16]  Daniel Jurafsky,et al.  Mutual Information and Diverse Decoding Improve Neural Machine Translation , 2016, ArXiv.

[17]  Yang Liu,et al.  Neural Machine Translation with Reconstruction , 2016, AAAI.

[18]  Dumitru Erhan,et al.  Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Reweighted range voting – new multiwinner voting method , 2005 .

[20]  T. Tideman,et al.  Collective Decisions and Voting: The Potential for Public Choice , 2006 .

[21]  Markus Freitag,et al.  APE at Scale and Its Implications on MT Evaluation Biases , 2019, WMT.

[22]  Marc'Aurelio Ranzato,et al.  Analyzing Uncertainty in Neural Machine Translation , 2018, ICML.

[23]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[24]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Satoshi Nakamura,et al.  Minimum Bayes-Risk decoding extended with similar examples: NAIST-NICT at IWSLT 2012 , 2012, IWSLT.

[26]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[27]  Shuly Wintner,et al.  Language Models for Machine Translation: Original vs. Translated Texts , 2011, CL.

[28]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[29]  Verena Rieser,et al.  Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity , 2018, EMNLP.

[30]  Max Welling,et al.  Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement , 2019, ICML.

[31]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[32]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[33]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[34]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[35]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[36]  J. Christopher Beck,et al.  Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models , 2019, ICML.

[37]  Wilker Aziz,et al.  Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation , 2020, COLING.

[38]  Eerik Lagerspetz,et al.  Social Choice and Democratic Values , 2015 .

[39]  Jianfeng Gao,et al.  Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access , 2016, ACL.

[40]  T. Tideman,et al.  Independence of clones as a criterion for voting rules , 1987 .

[41]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[42]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[43]  Andreas Maletti,et al.  Recurrent Neural Networks as Weighted Language Recognizers , 2017, NAACL.

[44]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[45]  Cyril Goutte,et al.  Automatic Detection of Translated Text and its Impact on Machine Translation , 2009, MTSUMMIT.

[46]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[47]  Bill Byrne,et al.  On NMT Search Errors and Model Errors: Cat Got Your Tongue? , 2019, EMNLP.

[48]  Xiaohua Hu,et al.  The Evaluation of Sentence Similarity Measures , 2008, DaWaK.

[49]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Daniel Jurafsky,et al.  A Simple, Fast Diverse Decoding Algorithm for Neural Generation , 2016, ArXiv.

[51]  Vijay K. Mago,et al.  Challenging the Boundaries of Unsupervised Learning for Semantic Similarity , 2019, IEEE Access.

[52]  Zhen Xu,et al.  Neural Response Generation via GAN with an Approximate Embedding Layer , 2017, EMNLP.

[53]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.