The CRINGE Loss: Learning what language not to model

Standard language model training employs gold human documents or human-human in-teraction data, and treats all training data as positive examples. Growing evidence shows that even with very large amounts of positive training data, issues remain that can be alle-viated with relatively small amounts of negative data – examples of what the model should not do. In this work, we propose a novel procedure to train with such data called the C RINGE loss (ContRastive Iterative Negative GEneration). We show the effectiveness of this approach across three different experiments on the tasks of safe generation, contradiction avoidance, and open-domain dialogue. Our models outperform multiple strong baselines and are conceptually simple, easy to train and implement.

[1]  J. Weston,et al.  Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback , 2022, ACL.

[2]  Eric Michael Smith,et al.  BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage , 2022, ArXiv.

[3]  J. Weston,et al.  Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls , 2022, ArXiv.

[4]  J. Weston,et al.  Director: Generator-Classifiers For Supervised Language Modeling , 2022, AACL.

[5]  Yejin Choi,et al.  Quark: Controllable Text Generation with Reinforced Unlearning , 2022, ArXiv.

[6]  M. de Rijke,et al.  A Simple Contrastive Learning Objective for Alleviating Neural Text Degeneration , 2022, ArXiv.

[7]  Shankar Kumar,et al.  Jam or Cream First? Modeling Ambiguity in Neural Machine Translation with SCONES , 2022, NAACL.

[8]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[9]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[10]  Jason Weston,et al.  Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity , 2021, NAACL-HLT.

[11]  Jason Weston,et al.  Beyond Goldfish Memory: Long-Term Open-Domain Conversation , 2021, ACL.

[12]  Jason Weston,et al.  Internet-Augmented Dialogue Generation , 2021, ACL.

[13]  Jeff Wu,et al.  WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.

[14]  Dario Amodei,et al.  A General Language Assistant as a Laboratory for Alignment , 2021, ArXiv.

[15]  Jan Leike,et al.  Recursively Summarizing Books with Human Feedback , 2021, ArXiv.

[16]  Shannon L. Spruit,et al.  Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling , 2021, ArXiv.

[17]  Jason Weston,et al.  Bot-Adversarial Dialogue for Safe Conversational Agents , 2021, NAACL.

[18]  D. Klein,et al.  FUDGE: Controlled Text Generation With Future Discriminators , 2021, NAACL.

[19]  Charles Foster,et al.  The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[20]  Mohit Bansal,et al.  I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling , 2020, ACL.

[21]  Shafiq R. Joty,et al.  GeDi: Generative Discriminator Guided Sequence Generation , 2020, EMNLP.

[22]  Edouard Grave,et al.  Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , 2020, EACL.

[23]  Mary Williamson,et al.  Recipes for Building an Open-Domain Chatbot , 2020, EACL.

[24]  Yejin Choi,et al.  RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.

[25]  Ryan J. Lowe,et al.  Learning to summarize from human feedback , 2020, NeurIPS 2020.

[26]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[27]  Mary Williamson,et al.  Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills , 2020, ACL.

[28]  Jeremy Blackburn,et al.  The Pushshift Reddit Dataset , 2020, ICWSM.

[29]  J. Yosinski,et al.  Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2019, ICLR.

[30]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[31]  James R. Glass,et al.  Negative Training for Neural Dialogue Response Generation , 2019, ACL.

[32]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[33]  Tom B. Brown,et al.  Fine-Tuning Language Models from Human Preferences , 2019, ArXiv.

[34]  Jason Weston,et al.  Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack , 2019, EMNLP.

[35]  Ido Dagan,et al.  Better Rewards Yield Better Summaries: Learning to Summarise Without References , 2019, EMNLP.

[36]  Natasha Jaques,et al.  Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.

[37]  Joelle Pineau,et al.  The Second Conversational Intelligence Challenge (ConvAI2) , 2019, The NeurIPS '18 Competition.

[38]  Jason Weston,et al.  Learning from Dialogue after Deployment: Feed Yourself, Chatbot! , 2019, ACL.

[39]  Joelle Pineau,et al.  A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[40]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[41]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[42]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[43]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.