论文信息 - The CRINGE Loss: Learning what language not to model - 字舞流文

The CRINGE Loss: Learning what language not to model

Standard language model training employs gold human documents or human-human in-teraction data, and treats all training data as positive examples. Growing evidence shows that even with very large amounts of positive training data, issues remain that can be alle-viated with relatively small amounts of negative data – examples of what the model should not do. In this work, we propose a novel procedure to train with such data called the C RINGE loss (ContRastive Iterative Negative GEneration). We show the effectiveness of this approach across three different experiments on the tasks of safe generation, contradiction avoidance, and open-domain dialogue. Our models outperform multiple strong baselines and are conceptually simple, easy to train and implement.

J. Weston | Sainbayar Sukhbaatar | Kurt Shuster | Leonard Adolphs | Jing Xu | Tianyu Gao

[1] J. Weston,et al. Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback , 2022, ACL.

[2] Eric Michael Smith,et al. BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage , 2022, ArXiv.

[3] J. Weston,et al. Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls , 2022, ArXiv.

[4] J. Weston,et al. Director: Generator-Classifiers For Supervised Language Modeling , 2022, AACL.

[5] Yejin Choi,et al. Quark: Controllable Text Generation with Reinforced Unlearning , 2022, ArXiv.

[6] M. de Rijke,et al. A Simple Contrastive Learning Objective for Alleviating Neural Text Degeneration , 2022, ArXiv.

[7] Shankar Kumar,et al. Jam or Cream First? Modeling Ambiguity in Neural Machine Translation with SCONES , 2022, NAACL.

[8] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[9] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.

[10] Jason Weston,et al. Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity , 2021, NAACL-HLT.

[11] Jason Weston,et al. Beyond Goldfish Memory: Long-Term Open-Domain Conversation , 2021, ACL.

[12] Jason Weston,et al. Internet-Augmented Dialogue Generation , 2021, ACL.

[13] Jeff Wu,et al. WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.

[14] Dario Amodei,et al. A General Language Assistant as a Laboratory for Alignment , 2021, ArXiv.

[15] Jan Leike,et al. Recursively Summarizing Books with Human Feedback , 2021, ArXiv.

[16] Shannon L. Spruit,et al. Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling , 2021, ArXiv.

[17] Jason Weston,et al. Bot-Adversarial Dialogue for Safe Conversational Agents , 2021, NAACL.

[18] D. Klein,et al. FUDGE: Controlled Text Generation With Future Discriminators , 2021, NAACL.

[19] Charles Foster,et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[20] Mohit Bansal,et al. I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling , 2020, ACL.

[21] Shafiq R. Joty,et al. GeDi: Generative Discriminator Guided Sequence Generation , 2020, EMNLP.

[22] Edouard Grave,et al. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , 2020, EACL.

[23] Mary Williamson,et al. Recipes for Building an Open-Domain Chatbot , 2020, EACL.

[24] Yejin Choi,et al. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.

[25] Ryan J. Lowe,et al. Learning to summarize from human feedback , 2020, NeurIPS 2020.

[26] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[27] Mary Williamson,et al. Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills , 2020, ACL.

[28] Jeremy Blackburn,et al. The Pushshift Reddit Dataset , 2020, ICWSM.

[29] J. Yosinski,et al. Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2019, ICLR.

[30] Jason Weston,et al. Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[31] James R. Glass,et al. Negative Training for Neural Dialogue Response Generation , 2019, ACL.

[32] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[33] Tom B. Brown,et al. Fine-Tuning Language Models from Human Preferences , 2019, ArXiv.

[34] Jason Weston,et al. Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack , 2019, EMNLP.

[35] Ido Dagan,et al. Better Rewards Yield Better Summaries: Learning to Summarise Without References , 2019, EMNLP.

[36] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.

[37] Joelle Pineau,et al. The Second Conversational Intelligence Challenge (ConvAI2) , 2019, The NeurIPS '18 Competition.

[38] Jason Weston,et al. Learning from Dialogue after Deployment: Feed Yourself, Chatbot! , 2019, ACL.

[39] Joelle Pineau,et al. A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[40] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[41] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[42] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[43] Lucas Dixon,et al. Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.