RankGen: Improving Text Generation with Large Ranking Models

Given an input sequence (or prefix), modern language models often assign high probabilities to output sequences that are repetitive, incoherent, or irrelevant to the prefix; as such, model-generated text also contains such artifacts. To address these issues we present RankGen, a 1.2B parameter encoder model for English that scores model generations given a prefix. RankGen can be flexibly incorporated as a scoring function in beam search and used to decode from any pretrained language model. We train RankGen using large-scale contrastive learning to map a prefix close to the ground-truth sequence that follows it and far away from two types of negatives: (1) random sequences from the same document as the prefix, and (2) sequences generated from a large language model conditioned on the prefix. Experiments across four different language models (345M-11B parameters) and two domains show that RankGen significantly outperforms decoding algorithms like nucleus, top-k, and typical sampling on both automatic metrics (85.0 vs 77.3 MAUVE) as well as human evaluations with English writers (74.5% human preference over nucleus sampling). Analysis reveals that RankGen outputs are more relevant to the prefix and improve continuity and coherence compared to baselines. We release our model checkpoints, code, and human preference data with explanations to facilitate future research.

[1]  Christopher D. Manning,et al.  Truncation Sampling as Language Model Desmoothing , 2022, EMNLP.

[2]  Xiang Lisa Li,et al.  Contrastive Decoding: Open-ended Text Generation as Optimization , 2022, ArXiv.

[3]  Yixuan Su,et al.  Contrastive Search Is What You Need For Neural Text Generation , 2022, Trans. Mach. Learn. Res..

[4]  Xipeng Qiu,et al.  CoNT: Contrastive Neural Text Generation , 2022, NeurIPS.

[5]  Mohit Iyyer,et al.  ChapterBreak: A Challenge Dataset for Long-Range Language Models , 2022, NAACL.

[6]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[7]  Marc van Zee,et al.  Scaling Up Models and Data with t5x and seqio , 2022, J. Mach. Learn. Res..

[8]  Dragomir R. Radev,et al.  BRIO: Bringing Order to Abstractive Summarization , 2022, ACL.

[9]  Taylor Berg-Kirkpatrick,et al.  Mix and Match: Learning-free Controllable Text Generationusing Energy Language Models , 2022, ACL.

[10]  Noah D. Goodman,et al.  Language modeling via stochastic processes , 2022, ICLR.

[11]  Mohit Iyyer,et al.  RELiC: Retrieving Evidence for Literary Claims , 2022, ACL.

[12]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[13]  Yejin Choi,et al.  COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics , 2022, NeurIPS.

[14]  Sebastian Gehrmann,et al.  Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text , 2022, J. Artif. Intell. Res..

[15]  Dani Yogatama,et al.  A Contrastive Framework for Neural Text Generation , 2022, NeurIPS.

[16]  Tiago Pimentel,et al.  Locally Typical Sampling , 2022, TACL.

[17]  Jeff Wu,et al.  WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.

[18]  Greg Durrett,et al.  Massive-scale Decoding for Text Generation using Lattices , 2021, NAACL.

[19]  Yi Tay,et al.  The Efficiency Misnomer , 2021, ICLR.

[20]  Stella Rose Biderman,et al.  Cut the CARP: Fishing for zero-shot story evaluation , 2021, ArXiv.

[21]  Mohit Iyyer,et al.  Do Long-Range Language Models Actually Use Long-Range Context? , 2021, EMNLP.

[22]  Marzena Karpinska,et al.  The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation , 2021, EMNLP.

[23]  Jason Weston,et al.  Internet-Augmented Dialogue Generation , 2021, ACL.

[24]  Noah A. Smith,et al.  Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text , 2021, Annual Meeting of the Association for Computational Linguistics.

[25]  Tal August,et al.  All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text , 2021, ACL.

[26]  Brian Lester,et al.  The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[27]  Ronan Le Bras,et al.  CLIPScore: A Reference-free Evaluation Metric for Image Captioning , 2021, EMNLP.

[28]  Iryna Gurevych,et al.  BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models , 2021, NeurIPS Datasets and Benchmarks.

[29]  Luke Zettlemoyer,et al.  Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right , 2021, EMNLP.

[30]  D. Klein,et al.  FUDGE: Controlled Text Generation With Future Discriminators , 2021, NAACL.

[31]  Aurko Roy,et al.  Hurdles to Progress in Long-form Question Answering , 2021, NAACL.

[32]  Yejin Choi,et al.  MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers , 2021, NeurIPS.

[33]  Frank Rudzicz,et al.  On Losses for Modern Language Models , 2020, EMNLP.

[34]  Mohit Iyyer,et al.  Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models , 2020, ACL.

[35]  Ming-Wei Chang,et al.  Retrieval Augmented Language Model Pre-Training , 2020, ICML.

[36]  Elizabeth Clark,et al.  Evaluation of Text Generation: A Survey , 2020, ArXiv.

[37]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[38]  Chris Callison-Burch,et al.  Toward Better Storylines with Sentence-Level Language Models , 2020, ACL.

[39]  Ryan McDonald,et al.  On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.

[40]  M. Zaharia,et al.  ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT , 2020, SIGIR.

[41]  Daphne Ippolito,et al.  Trading Off Diversity and Quality in Natural Language Generation , 2020, HUMEVAL.

[42]  Myle Ott,et al.  Residual Energy-Based Models for Text Generation , 2020, ICLR.

[43]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[44]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[45]  Ming Zhou,et al.  ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training , 2020, FINDINGS.

[46]  Timothy P. Lillicrap,et al.  Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.

[47]  Fabio Petroni,et al.  How Decoding Strategies Affect the Verifiability of Generated Text , 2019, FINDINGS.

[48]  Omer Levy,et al.  Generalization through Memorization: Nearest Neighbor Language Models , 2019, ICLR.

[49]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[50]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[51]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[52]  J. Yosinski,et al.  Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2019, ICLR.

[53]  Christopher D. Manning,et al.  Do Massively Pretrained Language Models Make Better Storytellers? , 2019, CoNLL.

[54]  Jean-Marc Andreoli,et al.  Global Autoregressive Models for Data-Efficient Sequence Learning , 2019, CoNLL.

[55]  Sameer Singh,et al.  Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[56]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[57]  Graham Neubig,et al.  Beyond BLEU:Training Neural Machine Translation with Semantic Similarity , 2019, ACL.

[58]  Marc'Aurelio Ranzato,et al.  Real or Fake? Learning to Discriminate Machine from Human Generated Text , 2019, ArXiv.

[59]  Alexander M. Rush,et al.  GLTR: Statistical Detection and Visualization of Generated Text , 2019, ACL.

[60]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[61]  Ali Farhadi,et al.  HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[62]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[63]  Eric Horvitz,et al.  Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting , 2019, DGS@ICLR.

[64]  Kyunghyun Cho,et al.  Non-Monotonic Sequential Text Generation , 2019, ICML.

[65]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[66]  James Allen,et al.  Tackling the Story Ending Biases in The Story Cloze Test , 2018, ACL.

[67]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[68]  Daniel Jurafsky,et al.  Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context , 2018, ACL.

[69]  Yejin Choi,et al.  Learning to Write with Cooperative Discriminators , 2018, ACL.

[70]  Noam Shazeer,et al.  Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.

[71]  Honglak Lee,et al.  An efficient framework for learning sentence representations , 2018, ICLR.

[72]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[73]  Marc'Aurelio Ranzato,et al.  Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[74]  Bin Wang,et al.  Learning Neural Trans-Dimensional Random Field Language Models with Noise-Contrastive Estimation , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[75]  Christopher Joseph Pal,et al.  Twin Networks: Matching the Future for Sequence Generation , 2017, ICLR.

[76]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[77]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[78]  Jason Weston,et al.  ParlAI: A Dialog Research Software Platform , 2017, EMNLP.

[79]  Juan Carlos Niebles,et al.  Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[80]  Samuel R. Bowman,et al.  Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning , 2017, ArXiv.

[81]  Daniel Jurafsky,et al.  Learning to Decode for Future Success , 2017, ArXiv.

[82]  Vatsal Sharan,et al.  Prediction with a short memory , 2016, STOC.

[83]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[84]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[85]  Nathanael Chambers,et al.  A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[86]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[87]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[88]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[89]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[90]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[91]  Mirella Lapata,et al.  Chinese Poetry Generation with Recurrent Neural Networks , 2014, EMNLP.

[92]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[93]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[94]  Tiago Pimentel,et al.  Typical Decoding for Natural Language Generation , 2022, ArXiv.

[95]  A. Linear-probe,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021 .

[96]  Marc'Aurelio Ranzato,et al.  Discriminative Reranking for Neural Machine Translation , 2021, ACL.

[97]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[98]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[99]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[100]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[101]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[102]  Jerry R. Hobbs Coherence and Coreference , 1979, Cogn. Sci..

[103]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[104]  F. Smith Peter; A Novel of Which He Is Not the Hero , 2022 .