Neural Generation Meets Real People: Towards Emotionally Engaging Mixed-Initiative Conversations

We present Chirpy Cardinal, an open-domain dialogue agent, as a research platform for the 2019 Alexa Prize competition. Building an open-domain socialbot that talks to real people is challenging - such a system must meet multiple user expectations such as broad world knowledge, conversational style, and emotional connection. Our socialbot engages users on their terms - prioritizing their interests, feelings and autonomy. As a result, our socialbot provides a responsive, personalized user experience, capable of talking knowledgeably about a wide variety of topics, as well as chatting empathetically about ordinary life. Neural generation plays a key role in achieving these goals, providing the backbone for our conversational and emotional tone. At the end of the competition, Chirpy Cardinal progressed to the finals with an average rating of 3.6/5.0, a median conversation duration of 2 minutes 16 seconds, and a 90th percentile duration of over 12 minutes.

[1]  Christopher Joseph Pal,et al.  Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study , 2019, ACL.

[2]  Oliver Lemon,et al.  Alana v2: Entertaining and Informative Open-domain Social Dialogue using Ontologies and Entity Linking , 2018 .

[3]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[4]  Daniel G. Bobrow,et al.  GUS, A Frame-Driven Dialog System , 1986, Artif. Intell..

[5]  Thomas Wolf,et al.  TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , 2019, ArXiv.

[6]  Verena Rieser,et al.  A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents , 2019, SIGdial.

[7]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[8]  Skipper Seabold,et al.  Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.

[9]  Verena Rieser,et al.  #MeToo Alexa: How Conversational Systems Respond to Sexual Harassment , 2018, EthNLP@NAACL-HLT.

[10]  Jason Weston,et al.  What makes a good conversation? How controllable attributes affect human judgments , 2019, NAACL.

[11]  Y-Lan Boureau,et al.  Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset , 2018, ACL.

[12]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[13]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[14]  Elizabeth Shriberg,et al.  Switchboard SWBD-DAMSL shallow-discourse-function annotation coders manual , 1997 .

[15]  Rahul Goel,et al.  On Evaluating and Comparing Open Domain Dialog Systems , 2018 .

[16]  Eric Horvitz,et al.  Principles of mixed-initiative user interfaces , 1999, CHI '99.

[17]  Petr Marek,et al.  Alquist 2.0: Alexa Prize Socialbot Based on Sub-Dialogue Models , 2020, ArXiv.

[18]  Dian Yu,et al.  Gunrock: A Social Bot for Complex and Engaging Long Conversations , 2019, EMNLP.

[19]  Dilek Z. Hakkani-Tür,et al.  Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize , 2018, ArXiv.

[20]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[21]  Dilek Z. Hakkani-Tür,et al.  Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations , 2019, INTERSPEECH.

[22]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[23]  Mun Yong Yi,et al.  Empathy Is All You Need: How a Conversational Agent Should Respond to Verbal Abuse , 2020, CHI.

[24]  Joelle Pineau,et al.  The Second Conversational Intelligence Challenge (ConvAI2) , 2019, The NeurIPS '18 Competition.

[25]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[26]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[27]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[28]  Zhou Yu,et al.  MIDAS: A Dialog Act Annotation Scheme for Open Domain HumanMachine Spoken Conversations , 2019, EACL.

[29]  Zhou Yu,et al.  Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation , 2020, ACL.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.