The Curious Case of Neural Text Degeneration

Despite considerable advancements with deep neural language models, the enigma of neural text degeneration persists when these models are tested as text generators. The counter-intuitive empirical observation is that even though the use of likelihood as training objective leads to high quality models for a broad range of language understanding tasks, using likelihood as a decoding objective leads to text that is bland and strangely repetitive. In this paper, we reveal surprising distributional differences between human text and machine text. In addition, we find that decoding strategies alone can dramatically effect the quality of machine text, even when generated from exactly the same neural language model. Our findings motivate Nucleus Sampling, a simple but effective method to draw the best out of neural generation. By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.

[1]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[2]  Yoav Goldberg,et al.  Controlling Linguistic Style Aspects in Neural Language Generation , 2017, ArXiv.

[3]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[4]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[5]  Daniel Jurafsky,et al.  A Simple, Fast Diverse Decoding Algorithm for Neural Generation , 2016, ArXiv.

[6]  Jason Weston,et al.  Importance of a Search Strategy in Neural Dialogue Modelling , 2018, ArXiv.

[7]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[8]  Percy Liang,et al.  Unifying Human and Statistical Evaluation for Natural Language Generation , 2019, NAACL.

[9]  Bill Byrne,et al.  On NMT Search Errors and Model Errors: Cat Got Your Tongue? , 2019, EMNLP.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[12]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[13]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search for Improved Description of Complex Scenes , 2018, AAAI.

[14]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[15]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[16]  Yiming Yang,et al.  Transformer-XL: Language Modeling with Longer-Term Dependency , 2018 .

[17]  S. Piantadosi Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[18]  Yejin Choi,et al.  Learning to Write with Cooperative Discriminators , 2018, ACL.

[19]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[20]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[21]  Xu Sun,et al.  Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation , 2018, EMNLP.

[22]  Lei Zheng,et al.  Texygen: A Benchmarking Platform for Text Generation Models , 2018, SIGIR.

[23]  Andreas Maletti,et al.  Recurrent Neural Networks as Weighted Language Recognizers , 2017, NAACL.

[24]  Nanyun Peng,et al.  Towards Controllable Story Generation , 2018 .

[25]  Alexander M. Rush,et al.  Challenges in Data-to-Document Generation , 2017, EMNLP.

[26]  Kyunghyun Cho,et al.  Importance of Search and Evaluation Strategies in Neural Dialogue Modeling , 2018, INLG.

[27]  Stanislau Semeniuta,et al.  On Accurate Evaluation of GANs for Language Generation , 2018, ArXiv.

[28]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[29]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[30]  Siobhan Chapman Logic and Conversation , 2005 .

[31]  Lei Yu,et al.  The Neural Noisy Channel , 2016, ICLR.

[32]  Regina Barzilay,et al.  Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[33]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[34]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[35]  Christopher Joseph Pal,et al.  Sparse Forward-Backward Using Minimum Divergence Beams for Fast Training Of Conditional Random Fields , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[36]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[37]  Joelle Pineau,et al.  Language GANs Falling Short , 2018, ICLR.

[38]  Enrique Alfonseca,et al.  Eval all, trust a few, do wrong to none: Comparing sentence generation models , 2018, ArXiv.

[39]  Jonathan Berant,et al.  Evaluating Text GANs as Language Models , 2018, NAACL.

[40]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[41]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .