Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

Frozen models trained to mimic static datasets can never improve their performance. Models that can employ internet-retrieval for up-to-date information and obtain feedback from humans during deployment provide the promise of both adapting to new information, and improving their performance. In this work we study how to improve internet-driven conversational skills in such a learning framework. We collect deployment data, which we make publicly available, of human interactions, and collect various types of human feedback – including binary quality measurements, free-form text feedback, and fine-grained reasons for failure. We then study various algorithms for improving from such feedback, including standard supervised learning, rejection sampling, model-guiding and reward-based learning, in order to make recommendations on which type of feedback and algorithms work best. We find the recently introduced D IRECTOR model (Arora et al., 2022) shows significant improvements over other existing approaches.

[1]  Eric Michael Smith,et al.  BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage , 2022, ArXiv.

[2]  J. Weston,et al.  Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls , 2022, ArXiv.

[3]  J. Weston,et al.  Director: Generator-Classifiers For Supervised Language Modeling , 2022, AACL.

[4]  Xi Victoria Lin,et al.  OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.

[5]  Tom B. Brown,et al.  Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , 2022, ArXiv.

[6]  J. Weston,et al.  Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion , 2022, EMNLP.

[7]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[8]  Renelito Delos Santos,et al.  LaMDA: Language Models for Dialog Applications , 2022, ArXiv.

[9]  Jason Weston,et al.  Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity , 2021, NAACL-HLT.

[10]  Jason Weston,et al.  Internet-Augmented Dialogue Generation , 2021, ACL.

[11]  Jason Weston,et al.  Beyond Goldfish Memory: Long-Term Open-Domain Conversation , 2021, ACL.

[12]  Jon Ander Campos,et al.  Training Language Models with Natural Language Feedback , 2022 .

[13]  Jeff Wu,et al.  WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.

[14]  Dario Amodei,et al.  A General Language Assistant as a Laboratory for Alignment , 2021, ArXiv.

[15]  Jason Weston,et al.  Reason first, then respond: Modular Generation for Knowledge-infused Dialogue , 2021, EMNLP.

[16]  D. Klein,et al.  FUDGE: Controlled Text Generation With Future Discriminators , 2021, NAACL.

[17]  Zhiyi Ma,et al.  Dynabench: Rethinking Benchmarking in NLP , 2021, NAACL.

[18]  Mohit Bansal,et al.  I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling , 2020, ACL.

[19]  Shafiq R. Joty,et al.  GeDi: Generative Discriminator Guided Sequence Generation , 2020, EMNLP.

[20]  Edouard Grave,et al.  Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , 2020, EACL.

[21]  Mary Williamson,et al.  Recipes for Building an Open-Domain Chatbot , 2020, EACL.

[22]  Jason Weston,et al.  Deploying Lifelong Open-Domain Dialogue Learning , 2020, ArXiv.

[23]  Aaron J. Moss,et al.  Demographic Stability on Mechanical Turk Despite COVID-19 , 2020, Trends in Cognitive Sciences.

[24]  Mary Williamson,et al.  Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills , 2020, ACL.

[25]  Quoc V. Le,et al.  Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[26]  Jeremy Blackburn,et al.  The Pushshift Reddit Dataset , 2020, ICWSM.

[27]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[28]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[29]  Jason Weston,et al.  Learning from Dialogue after Deployment: Feed Yourself, Chatbot! , 2019, ACL.

[30]  Jason Weston,et al.  Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.

[31]  Jason Weston,et al.  Learning Through Dialogue Interactions , 2016, ICLR.

[32]  Jason Weston,et al.  Dialogue Learning With Human-In-The-Loop , 2016, ICLR.

[33]  A. Tate A measure of intelligence , 2012 .

[34]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.