Boosting coherence of language models

Naturality of long-term information structure – coherence – remains a challenge in language generation. Large language models have insufficiently learned such structure, as their longform generations differ from natural text in measures of coherence. To alleviate this divergence, we propose coherence boosting, an inference procedure that increases the effect of distant context on next-token prediction. We show the benefits of coherence boosting with pretrained models by distributional analyses of generated ordinary text and dialog responses. We also find that coherence boosting with state-of-the-art models for various zeroshot NLP tasks yields performance gains with no additional training.

[1]  Aurko Roy,et al.  Efficient Content-Based Sparse Attention with Routing Transformers , 2021, TACL.

[2]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[3]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[4]  Christopher Joseph Pal,et al.  Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study , 2019, ACL.

[5]  Nathanael Chambers,et al.  A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[6]  Jianfeng Gao,et al.  DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.

[7]  Yejin Choi,et al.  PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.

[8]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[9]  Nan Jiang,et al.  Language Generation via Combinatorial Constraint Satisfaction: A Tree Search Enhanced Monte-Carlo Approach , 2020, EMNLP 2020.

[10]  Dan Klein,et al.  Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.

[11]  Bill Dolan,et al.  Grounded Response Generation Task at DSTC7 , 2019 .

[12]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[15]  Daniel Jurafsky,et al.  Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context , 2018, ACL.

[16]  Li Yang,et al.  Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[17]  Zhe Gan,et al.  Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization , 2018, NeurIPS.

[18]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[19]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[20]  Peter Clark,et al.  Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.

[21]  Yejin Choi,et al.  Surface Form Competition: Why the Highest Probability Answer Isn't Always Right , 2021, EMNLP.

[22]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[23]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[24]  Ali Farhadi,et al.  HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[25]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[26]  Lei Li,et al.  CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling , 2018, AAAI.

[27]  Ellen M. Voorhees,et al.  Building a question answering test collection , 2000, SIGIR '00.

[28]  Clara Meister,et al.  Language Model Evaluation Beyond Perplexity , 2021, ACL.

[29]  Meng Liao,et al.  Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases , 2021, ACL.

[30]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[31]  Lei Zheng,et al.  Texygen: A Benchmarking Platform for Text Generation Models , 2018, SIGIR.

[32]  Judith Tonhauser,et al.  The CommitmentBank: Investigating projection in naturally occurring discourse , 2019 .

[33]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[34]  Zornitsa Kozareva,et al.  SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.

[35]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[36]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[37]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[38]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[39]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[40]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[41]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[42]  Hannaneh Hajishirzi,et al.  Noisy Channel Language Model Prompting for Few-Shot Text Classification , 2021, ArXiv.

[43]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[44]  Oren Etzioni,et al.  Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.

[45]  Sandro Pezzelle,et al.  The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.

[46]  Nebojsa Jojic,et al.  Studying word order through iterative shuffling , 2021, EMNLP.

[47]  Clara Meister,et al.  A Cognitive Regularizer for Language Modeling , 2021, ACL.

[48]  Mike Lewis,et al.  Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation , 2021, ArXiv.

[49]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[50]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.