A Simple, Fast Diverse Decoding Algorithm for Neural Generation

In this paper, we propose a simple, fast decoding algorithm that fosters diversity in neural generation. The algorithm modifies the standard beam search algorithm by adding an inter-sibling ranking penalty, favoring choosing hypotheses from diverse parents. We evaluate the proposed model on the tasks of dialogue response generation, abstractive summarization and machine translation. We find that diverse decoding helps across all tasks, especially those for which reranking is needed. We further propose a variation that is capable of automatically adjusting its diversity decoding rates for different inputs using reinforcement learning (RL). We observe a further performance boost from this RL technique. This paper includes material from the unpublished script "Mutual Information and Diverse Decoding Improve Neural Machine Translation" (Li and Jurafsky, 2016).

[1]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[3]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[4]  Mari Ostendorf,et al.  LSTM based Conversation Models , 2016, ArXiv.

[5]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[6]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[7]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[8]  Jingbo Zhu,et al.  Bagging and Boosting statistical machine translation systems , 2013, Artif. Intell..

[9]  Kyunghyun Cho,et al.  Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model , 2016, ArXiv.

[10]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[11]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[12]  Gregory Shakhnarovich,et al.  Diverse M-Best Solutions in Markov Random Fields , 2012, ECCV.

[13]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[14]  Christopher D. Manning,et al.  Deep Reinforcement Learning for Mention-Ranking Coreference Models , 2016, EMNLP.

[15]  Spyridon Matsoukas,et al.  Trait-Based Hypothesis Selection For Machine Translation , 2012, HLT-NAACL.

[16]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[17]  Daniel Jurafsky,et al.  A Hierarchical Neural Autoencoder for Paragraphs and Documents , 2015, ACL.

[18]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[19]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[20]  Xinlei Chen,et al.  Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[21]  Gregory Shakhnarovich,et al.  A Systematic Exploration of Diversity in Machine Translation , 2013, EMNLP.

[22]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[23]  Wang Ling,et al.  Learning to Compose Words into Sentences with Reinforcement Learning , 2016, ICLR.

[24]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[25]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[26]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models , 2016, ArXiv.

[27]  J. Spall STOCHASTIC OPTIMIZATION , 2002 .

[28]  Denny Britz,et al.  Generating Long and Diverse Responses with Neural Conversation Models , 2017, ArXiv.

[29]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines , 2015, ArXiv.

[30]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[31]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[32]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[33]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[34]  Shankar Kumar,et al.  Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2008, EMNLP.

[35]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[36]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[37]  Yoshua Bengio,et al.  On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[38]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[39]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[40]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[41]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines - Revised , 2015 .

[42]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[43]  Peter W. Glynn,et al.  Likelilood ratio gradient estimation: an overview , 1987, WSC '87.

[44]  Nicola Cancedda,et al.  Minimum Error Rate Training by Sampling the Translation Lattice , 2010, EMNLP.

[45]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[46]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[47]  Jianfeng Gao,et al.  Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access , 2016, ACL.

[48]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[49]  Ani Nenkova,et al.  The Impact of Frequency on Summarization , 2005 .

[50]  Yang Liu,et al.  Agreement-Based Joint Training for Bidirectional Attention-Based Neural Machine Translation , 2015, IJCAI.

[51]  Andrew Y. Ng,et al.  Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[52]  Daniel Jurafsky,et al.  Positive Diversity Tuning for Machine Translation System Combination , 2013, WMT@ACL.