The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents

We introduce dodecaDialogue: a set of 12 tasks that measures if a conversational agent can communicate engagingly with personality and empathy, ask questions, answer questions by utilizing knowledge resources, discuss topics and situations, and perceive and converse about images. By multi-tasking on such a broad large-scale set of data, we hope to both move towards and measure progress in producing a single unified agent that can perceive, reason and converse with humans in an open-domain setting. We show that such multi-tasking improves over a BERT pre-trained baseline, largely due to multi-tasking with very large dialogue datasets in a similar domain, and that the multi-tasking in general provides gains to both text and image-based tasks using several metrics in both the fine-tune and task transfer settings. We obtain state-of-the-art results on many of the tasks, providing a strong baseline for this challenge.

[1]  Jason Weston,et al.  ELI5: Long Form Question Answering , 2019, ACL.

[2]  Eunsol Choi,et al.  QuAC: Question Answering in Context , 2018, EMNLP.

[3]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[4]  Mitesh M. Khapra,et al.  Towards Exploiting Background Knowledge for Building Conversation Systems , 2018, EMNLP.

[5]  Mari Ostendorf,et al.  LSTM based Conversation Models , 2016, ArXiv.

[6]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[7]  Thomas Wolf,et al.  TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , 2019, ArXiv.

[8]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[9]  Xiaodong Liu,et al.  Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading , 2019, ACL.

[10]  Cristian Danescu-Niculescu-Mizil,et al.  Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs , 2011, CMCL@ACL.

[11]  Jason Weston,et al.  Learning to Speak and Act in a Fantasy Text Adventure Game , 2019, EMNLP.

[12]  Jianfeng Gao,et al.  Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation , 2017, IJCNLP.

[13]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[14]  Jason Weston,et al.  ParlAI: A Dialog Research Software Platform , 2017, EMNLP.

[15]  Bing Liu,et al.  Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models , 2019, ArXiv.

[16]  Cho-Jui Hsieh,et al.  VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.

[17]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[18]  Jason Weston,et al.  Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.

[19]  Bing Liu,et al.  Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models , 2019, ArXiv.

[20]  MAKING KNIGHTS SMILE IN A FANTASY GAME WORLD , 2019 .

[21]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[22]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[23]  Jason Weston,et al.  Learning from Dialogue after Deployment: Feed Yourself, Chatbot! , 2019, ACL.

[24]  Joelle Pineau,et al.  The Second Conversational Intelligence Challenge (ConvAI2) , 2019, The NeurIPS '18 Competition.

[25]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[26]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[27]  Jason Weston,et al.  ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons , 2019, ArXiv.

[28]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[29]  Jason Weston,et al.  What makes a good conversation? How controllable attributes affect human judgments , 2019, NAACL.

[30]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[31]  Richard Socher,et al.  The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.

[32]  Jason Weston,et al.  Real-time Inference in Multi-sentence Tasks with Deep Pretrained Transformers , 2019, ArXiv.

[33]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[35]  Dilek Z. Hakkani-Tür,et al.  Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations , 2019, INTERSPEECH.

[36]  Jonathan Berant,et al.  MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension , 2019, ACL.

[37]  Xiaodong Liu,et al.  Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[38]  José M. F. Moura,et al.  Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jason Weston,et al.  Engaging Image Captioning via Personality , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Lav R. Varshney,et al.  CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[41]  Jason Weston,et al.  I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents , 2019, ArXiv.

[42]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[43]  Pararth Shah,et al.  Memory Grounded Conversational Reasoning , 2019, EMNLP/IJCNLP.

[44]  Antoine Bordes,et al.  Training Millions of Personalized Dialogue Agents , 2018, EMNLP.

[45]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[46]  Y-Lan Boureau,et al.  Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset , 2018, ACL.

[47]  Marc'Aurelio Ranzato,et al.  Real or Fake? Learning to Discriminate Machine from Human Generated Text , 2019, ArXiv.

[48]  Stefan Lee,et al.  ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[49]  Ray Kurzweil,et al.  Learning Semantic Textual Similarity from Conversations , 2018, Rep4NLP@ACL.

[50]  Mohit Bansal,et al.  LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.

[51]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[52]  Jason Weston,et al.  Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring , 2019 .

[53]  Jason Weston,et al.  Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.