Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions

We present our view of what is necessary to build an engaging open-domain conversational agent: covering the qualities of such an agent, the pieces of the puzzle that have been built so far, and the gaping holes we have not filled yet. We present a biased view, focusing on work done by our own group, while citing related work in each area. In particular, we discuss in detail the properties of continual learning, providing engaging content, and being well-behaved -- and how to measure success in providing them. We end with a discussion of our experience and learnings, and our recommendations to the community.

[1]  Learning Robust Dialog Policies in Noisy Environments , 2017, ArXiv.

[2]  Kyunghyun Cho,et al.  Importance of Search and Evaluation Strategies in Neural Dialogue Modeling , 2018, INLG.

[3]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[4]  Justine Cassell,et al.  Relational agents: a model and implementation of building user trust , 2001, CHI.

[5]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[6]  Yejin Choi,et al.  Counterfactual Story Reasoning and Generation , 2019, EMNLP.

[7]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[8]  Jason Weston,et al.  ELI5: Long Form Question Answering , 2019, ACL.

[9]  Jianfeng Gao,et al.  Microsoft Icecaps: An Open-Source Toolkit for Conversation Modeling , 2019, ACL.

[10]  Jason Weston,et al.  Learning through Dialogue Interactions by Asking Questions , 2016, ICLR.

[11]  Jason Weston,et al.  Learning to Speak and Act in a Fantasy Text Adventure Game , 2019, EMNLP.

[12]  Gina Neff,et al.  Automation, Algorithms, and Politics| Talking to Bots: Symbiotic Agency and the Case of Tay , 2016 .

[13]  Jason Weston,et al.  Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring , 2020, ICLR.

[14]  Jason Weston,et al.  ParlAI: A Dialog Research Software Platform , 2017, EMNLP.

[15]  Jason Weston,et al.  Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack , 2019, EMNLP.

[16]  Guillaume Lample,et al.  Multiple-Attribute Text Rewriting , 2018, ICLR.

[17]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[18]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[19]  Chris Quirk,et al.  Towards Content Transfer through Grounded Text Generation , 2019, NAACL.

[20]  Jason Weston,et al.  All-in-One Image-Grounded Conversational Agents , 2019, ArXiv.

[21]  Gordon D. A. Brown,et al.  Absolute identification by relative judgment. , 2005, Psychological review.

[22]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[23]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Lihong Li,et al.  Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..

[25]  Keith W. Miller,et al.  Why we should have seen that coming: comments on Microsoft's tay "experiment," and wider implications , 2017, CSOC.

[26]  Jason Weston,et al.  Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training , 2020, ACL.

[27]  Hannes Schulz,et al.  Frames: a corpus for adding memory to goal-oriented dialogue systems , 2017, SIGDIAL Conference.

[28]  Xiaodong Liu,et al.  Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading , 2019, ACL.

[29]  Jodi Forlizzi,et al.  Receptionist or information kiosk: how do people talk with a robot? , 2010, CSCW '10.

[30]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[31]  Joelle Pineau,et al.  A Deep Reinforcement Learning Chatbot , 2017, ArXiv.

[32]  Stefan Lee,et al.  Embodied Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[34]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[35]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[36]  Edouard Grave,et al.  Adaptive Attention Span in Transformers , 2019, ACL.

[37]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[38]  Jianfeng Gao,et al.  A Persona-Based Neural Conversation Model , 2016, ACL.

[39]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[40]  Rachel Rudinger,et al.  Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[41]  James R. Glass,et al.  Negative Training for Neural Dialogue Response Generation , 2019, ACL.

[42]  Mary Williamson,et al.  Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills , 2020, ACL.

[43]  Jason Weston,et al.  Dialogue Learning With Human-In-The-Loop , 2016, ICLR.

[44]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[45]  Yi Pan,et al.  Conversational AI: The Science Behind the Alexa Prize , 2018, ArXiv.

[46]  Clifford Nass,et al.  The media equation - how people treat computers, television, and new media like real people and places , 1996 .

[47]  Y-Lan Boureau,et al.  Zero-Shot Fine-Grained Style Transfer: Leveraging Distributed Continuous Style Representations to Transfer To Unseen Styles , 2019, ArXiv.

[48]  Mari Ostendorf,et al.  Sounding Board: A User-Centric and Content-Driven Social Chatbot , 2018, NAACL.

[49]  Y-Lan Boureau,et al.  Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset , 2018, ACL.

[50]  Gökhan Tür,et al.  Building a Conversational Agent Overnight with Dialogue Self-Play , 2018, ArXiv.

[51]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[52]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[53]  Jason Weston,et al.  Dialogue Natural Language Inference , 2018, ACL.

[54]  Jason Weston,et al.  The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents , 2020, ACL.

[55]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[56]  Verena Rieser,et al.  RankME: Reliable Human Ratings for Natural Language Generation , 2018, NAACL.

[57]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[58]  Jason Weston,et al.  Engaging Image Captioning via Personality , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Oliver Lemon,et al.  Spoken Conversational AI in Video Games: Emotional Dialogue Management Increases User Engagement , 2018, IVA.

[60]  Bing Liu,et al.  Bootstrapping a Neural Conversational Agent with Dialogue Self-Play, Crowdsourcing and On-Line Reinforcement Learning , 2018, NAACL.

[61]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[62]  Joelle Pineau,et al.  The Second Conversational Intelligence Challenge (ConvAI2) , 2019, The NeurIPS '18 Competition.

[63]  Natasha Jaques,et al.  Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems , 2019, NeurIPS.

[64]  Joelle Pineau,et al.  Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses , 2017, ACL.

[65]  Quoc V. Le,et al.  Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[66]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[67]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Pararth Shah,et al.  Memory Grounded Conversational Reasoning , 2019, EMNLP/IJCNLP.

[69]  Nanyun Peng,et al.  Towards Controllable Story Generation , 2018 .

[70]  Antoine Bordes,et al.  Training Millions of Personalized Dialogue Agents , 2018, EMNLP.

[71]  Yoav Goldberg,et al.  Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.

[72]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[73]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[74]  Harry Shum,et al.  From Eliza to XiaoIce: challenges and opportunities with social chatbots , 2018, Frontiers of Information Technology & Electronic Engineering.

[75]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[76]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[77]  Jason Weston,et al.  Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[78]  Danqi Chen,et al.  CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[79]  Wei Xu,et al.  Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Matthew Henderson,et al.  A Repository of Conversational Datasets , 2019, Proceedings of the First Workshop on NLP for Conversational AI.

[81]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[82]  K. Wentzel Student motivation in middle school: The role of perceived pedagogical caring. , 1997 .

[83]  Seungwhan Moon,et al.  OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs , 2019, ACL.

[84]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[85]  Rahul Goel,et al.  On Evaluating and Comparing Conversational Agents , 2018, ArXiv.

[86]  Alan Ritter,et al.  Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints , 2018, EMNLP.

[87]  Pararth Shah,et al.  Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue , 2019, EMNLP.

[88]  Eunsol Choi,et al.  QuAC: Question Answering in Context , 2018, EMNLP.

[89]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[90]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[91]  Joelle Pineau,et al.  Extending Neural Generative Conversational Model using External Knowledge Sources , 2018, EMNLP.

[92]  Graham Neubig,et al.  Controlling Output Length in Neural Encoder-Decoders , 2016, EMNLP.

[93]  Xing Shi,et al.  Hafez: an Interactive Poetry Generation System , 2017, ACL.

[94]  Harry Shum,et al.  The Design and Implementation of XiaoIce, an Empathetic Social Chatbot , 2018, CL.

[95]  Angela Fan,et al.  Controllable Abstractive Summarization , 2017, NMT@ACL.

[96]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[97]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[98]  Jason Weston,et al.  What makes a good conversation? How controllable attributes affect human judgments , 2019, NAACL.

[99]  Jianfeng Gao,et al.  DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.

[100]  Mary Williamson,et al.  Recipes for Building an Open-Domain Chatbot , 2020, EACL.

[101]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[102]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[103]  Thomas Wolf,et al.  TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , 2019, ArXiv.

[104]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[105]  Edouard Grave,et al.  Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.

[106]  Helen Hastie,et al.  Metrics and Evaluation of Spoken Dialogue Systems , 2012 .

[107]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[108]  Tomoaki Nakamura,et al.  Human-like Natural Language Generation Using Monte Carlo Tree Search , 2016, CC-NLG.

[109]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[110]  Danqi Chen,et al.  A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.

[111]  Jianfeng Gao,et al.  Challenges in Building Intelligent Open-domain Dialog Systems , 2019, ACM Trans. Inf. Syst..

[112]  Jason Weston,et al.  Image-Chat: Engaging Grounded Conversations , 2020, ACL.

[113]  Jason Weston,et al.  ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons , 2019, ArXiv.

[114]  Jason Weston,et al.  Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent , 2017, ICLR.

[115]  Jeremy Blackburn,et al.  The Pushshift Reddit Dataset , 2020, ICWSM.

[116]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[117]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[118]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[119]  Marilyn A. Walker,et al.  Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.

[120]  Joelle Pineau,et al.  A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[121]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[122]  Jason Weston,et al.  Learning from Dialogue after Deployment: Feed Yourself, Chatbot! , 2019, ACL.

[123]  Jason Weston,et al.  Why Build an Assistant in Minecraft? , 2019, ArXiv.

[124]  Mohit Bansal,et al.  Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.

[125]  W. Levinson,et al.  A study of patient clues and physician responses in primary care and surgical settings. , 2000, JAMA.

[126]  Xiang Zhang,et al.  Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems , 2015, ICLR.

[127]  Verena Rieser,et al.  Why We Need New Evaluation Metrics for NLG , 2017, EMNLP.

[128]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[129]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[130]  Gökhan Tür,et al.  Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning , 2019, SIGdial.

[131]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[132]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[133]  Gabriel Synnaeve,et al.  Wav2Letter: an End-to-End ConvNet-based Speech Recognition System , 2016, ArXiv.

[134]  José M. F. Moura,et al.  Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[135]  Jason Weston,et al.  Retrieve and Refine: Improved Sequence Generation Models For Dialogue , 2018, SCAI@EMNLP.

[136]  Jason Weston,et al.  Tracking the World State with Recurrent Entity Networks , 2016, ICLR.

[137]  Jason Weston,et al.  Dialog-based Language Learning , 2016, NIPS.

[138]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[139]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[140]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[141]  Jason Weston,et al.  I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents , 2019, ArXiv.

[142]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[143]  Jiasen Lu,et al.  VQA: Visual Question Answering , 2015, ICCV.

[144]  M. Johnston,et al.  The Effects of Physician Empathy on Patient Satisfaction and Compliance , 2004, Evaluation & the health professions.

[145]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[146]  Nick Pawlowski,et al.  Rasa: Open Source Language Understanding and Dialogue Management , 2017, ArXiv.