MultiWOZ 2.3: A Multi-domain Task-Oriented Dialogue Dataset Enhanced with Annotation Corrections and Co-Reference Annotation

Task-oriented dialogue systems have made unprecedented progress with multiple state-ofthe-art (SOTA) models underpinned by a number of publicly available MultiWOZ datasets. Dialogue state annotations are error-prone, leading to sub-optimal performance. Various efforts have been put in rectifying the annotation errors presented in the original MultiWOZ dataset. In this paper, we introduce MultiWOZ 2.3, in which we differentiate incorrect annotations in dialogue acts from dialogue states, identifying a lack of co-reference when publishing the updated dataset. To ensure consistency between dialogue acts and dialogue states, we implement co-reference features and unify annotations of dialogue acts and dialogue states. We update the state of the art performance of natural language understanding and dialogue state tracking on MultiWOZ 2.3, where the results show significant improvements than on previous versions of MultiWOZ datasets (2.0-2.2).

[1]  Gökhan Tür,et al.  User Modeling for Task Oriented Dialogues , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[2]  Minlie Huang,et al.  Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog , 2019, EMNLP.

[3]  Tae-Yoon Kim,et al.  SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking , 2019, ACL.

[4]  Changjian Hu,et al.  GECOR: An End-to-End Generative Ellipsis and Co-reference Resolution Model for Task-Oriented Dialogue , 2019, EMNLP.

[5]  Dilek Z. Hakkani-Tür,et al.  DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue , 2020, ArXiv.

[6]  Gyuwan Kim,et al.  Efficient Dialogue State Tracking by Selectively Overwriting Memory , 2020, ACL.

[7]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[8]  Jindong Chen,et al.  MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines , 2020, NLP4CONVAI.

[9]  Cheng Niu,et al.  Improving Multi-turn Dialogue Modelling with Utterance ReWriter , 2019, ACL.

[10]  Jianmo Ni,et al.  Scalable and Accurate Dialogue State Tracking via Hierarchical Sequence Generation , 2019, EMNLP.

[11]  Richard Socher,et al.  Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems , 2019, ACL.

[12]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[13]  Yan Wang,et al.  Improving Open-Domain Dialogue Systems via Multi-Turn Incomplete Utterance Restoration , 2019, EMNLP.

[14]  Raghav Gupta,et al.  Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , 2020, AAAI.

[15]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[16]  Antoine Raux,et al.  The Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[17]  Mihail Eric,et al.  MultiWOZ 2. , 2019 .

[18]  Zheng Zhang,et al.  CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset , 2020, Transactions of the Association for Computational Linguistics.

[19]  Dilek Z. Hakkani-Tür,et al.  Dialog State Tracking: A Neural Reading Comprehension Approach , 2019, SIGdial.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Sungjin Lee,et al.  ConvLab: Multi-Domain End-to-End Dialog System Platform , 2019, ACL.

[22]  Yi Guo,et al.  Slot Attention with Value Normalization for Multi-domain Dialogue State Tracking , 2020, EMNLP.

[23]  Wenhu Chen,et al.  Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention , 2019, ACL.

[24]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[25]  Jianfeng Gao,et al.  ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems , 2020, ACL.

[26]  Nurul Lubis,et al.  TripPy: A Triple Copy Strategy for Value Independent Neural Dialog State Tracking , 2020, SIGdial.

[27]  Philip S. Yu,et al.  Find or Classify? Dual Strategy for Slot-Value Predictions on Multi-Domain Dialog State Tracking , 2019, STARSEM.

[28]  Henrique Lopes Cardoso,et al.  Coreference Resolution: Toward End-to-End and Cross-Lingual Systems , 2020, Inf..

[29]  Maxine Eskénazi,et al.  Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models , 2019, NAACL.

[30]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .