论文信息 - I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling - 字舞流文

I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling

To quantify how well natural language understanding models can capture consistency in a general conversation, we introduce the DialoguE COntradiction DEtection task (DECODE) and a new conversational dataset containing both human-human and human-bot contradictory dialogues. We then compare a structured utterance-based approach of using pre-trained Transformer models for contradiction detection with the typical unstructured approach. Results reveal that: (i) our newly collected dataset is notably more effective at providing supervision for the dialogue contradiction detection task than existing NLI data including those aimed to cover the dialogue domain; (ii) the structured utterance-based approach is more robust and transferable on both analysis and out-of-distribution dialogues than its unstructured counterpart. We also show that our best contradiction detection model correlates well with human judgements and further provide evidence for its usage in both automatically evaluating and improving the consistency of state-of-the-art generative chatbots.

Mohit Bansal | Mary Williamson | Jason Weston | Douwe Kiela | Yixin Nie | J. Weston | Douwe Kiela | Mohit Bansal | Mary Williamson | Yixin Nie

[1] Joelle Pineau,et al. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[2] Ashwin K. Vijayakumar,et al. Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models , 2016, ArXiv.

[3] Osmar R. Zaïane,et al. Evaluating Coherence in Dialogue Systems using Entailment , 2019, NAACL.

[4] Jianfeng Gao,et al. A Persona-Based Neural Conversation Model , 2016, ACL.

[5] Joelle Pineau,et al. The Second Conversational Intelligence Challenge (ConvAI2) , 2019, The NeurIPS '18 Competition.

[6] Andrew P. Bradley,et al. The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[7] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[8] Siobhan Chapman. Logic and Conversation , 2005 .

[9] C. Nass,et al. Machines and Mindlessness , 2000 .

[10] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[11] Jason Weston,et al. Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[12] Joelle Pineau,et al. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[13] Noah A. Smith,et al. Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.

[14] Jason Weston,et al. Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training , 2020, ACL.

[15] Yoav Goldberg,et al. Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets , 2019, EMNLP.

[16] Ting Liu,et al. Neural personalized response generation as domain adaptation , 2017, World Wide Web.

[17] Jason Weston,et al. Retrieve and Refine: Improved Sequence Generation Models For Dialogue , 2018, SCAI@EMNLP.

[18] Robert Pasero,et al. A Dialogue in Natural Language , 1982, ICLP.

[19] Gary Marcus,et al. Deep Learning: A Critical Appraisal , 2018, ArXiv.

[20] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[22] Nebojsa Jojic,et al. Steering Output Style and Topic in Neural Response Generation , 2017, EMNLP.

[23] Xiang Zhou,et al. What Can We Learn from Collective Human Opinions on Natural Language Inference Data? , 2020, EMNLP.

[24] Jeremy Blackburn,et al. The Pushshift Reddit Dataset , 2020, ICWSM.

[25] Jason Weston,et al. ParlAI: A Dialog Research Software Platform , 2017, EMNLP.

[26] Jianfeng Gao,et al. DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.

[27] Y-Lan Boureau,et al. Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset , 2018, ACL.

[28] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.

[29] Mohit Bansal,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.

[30] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[31] Jason Weston,et al. ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons , 2019, ArXiv.

[32] Mary Williamson,et al. Recipes for Building an Open-Domain Chatbot , 2020, EACL.

[33] Jason Weston,et al. Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring , 2019 .

[34] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[35] Mary Williamson,et al. Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills , 2020, ACL.

[36] Jason Weston,et al. What makes a good conversation? How controllable attributes affect human judgments , 2019, NAACL.

[37] Jason Weston,et al. Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.

[38] Quoc V. Le,et al. Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[39] Osmar R. Zaïane,et al. Augmenting Neural Response Generation with Context-Aware Topical Attention , 2018, Proceedings of the First Workshop on NLP for Conversational AI.

[40] Jason Weston,et al. Dialogue Natural Language Inference , 2018, ACL.

[41] Sungjin Lee,et al. Consistent Dialogue Generation with Self-supervised Feature Learning , 2019, ArXiv.

[42] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[43] Joelle Pineau,et al. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[44] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[45] Sameer Singh,et al. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.

[46] Xiaoyan Zhu,et al. Assigning Personality/Profile to a Chatting Machine for Coherent Conversation Generation , 2018, IJCAI.

[47] Harry Shum,et al. The Design and Implementation of XiaoIce, an Empathetic Social Chatbot , 2018, CL.