MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines

MultiWOZ is a well-known task-oriented dialogue dataset containing over 10,000 annotated dialogues spanning 8 domains. It is extensively used as a benchmark for dialogue state tracking. However, recent works have reported presence of substantial noise in the dialogue state annotations. MultiWOZ 2.1 identified and fixed many of these erroneous annotations and user utterances, resulting in an improved version of this dataset. This work introduces MultiWOZ 2.2, which is a yet another improved version of this dataset. Firstly, we identify and fix dialogue state annotation errors across 17.3% of the utterances on top of MultiWOZ 2.1. Secondly, we redefine the ontology by disallowing vocabularies of slots with a large number of possible values (e.g., restaurant name, time of booking). In addition, we introduce slot span annotations for these slots to standardize them across recent models, which previously used custom string matching heuristics to generate them. We also benchmark a few state of the art dialogue state tracking models on the corrected dataset to facilitate comparison for future work. In the end, we discuss best practices for dialogue data collection that can help avoid annotation errors.

[1]  J. F. Kelley,et al.  An iterative design methodology for user-friendly natural language office information applications , 1984, TOIS.

[2]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[3]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[4]  Dilek Z. Hakkani-Tür,et al.  Scalable multi-domain dialogue state tracking , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[5]  Lu Chen,et al.  Towards Universal Dialogue State Tracking , 2018, EMNLP.

[6]  Qi Hu,et al.  An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking , 2018, ACL.

[7]  Gökhan Tür,et al.  Building a Conversational Agent Overnight with Dialogue Self-Play , 2018, ArXiv.

[8]  Bill Byrne,et al.  Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset , 2019, EMNLP.

[9]  Sungjin Lee,et al.  ConvLab: Multi-Domain End-to-End Dialog System Platform , 2019, ACL.

[10]  Dilek Z. Hakkani-Tür,et al.  HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking , 2019, INTERSPEECH.

[11]  Dilek Z. Hakkani-Tür,et al.  Dialog State Tracking: A Neural Reading Comprehension Approach , 2019, SIGdial.

[12]  Li Zhou,et al.  Multi-domain Dialogue State Tracking as Dynamic Knowledge Graph Enhanced Question Answering , 2019, ArXiv.

[13]  Ian Lane,et al.  BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer , 2019, INTERSPEECH.

[14]  Dilek Z. Hakkani-Tür,et al.  MultiWOZ 2.1: Multi-Domain Dialogue State Corrections and State Tracking Baselines , 2019, ArXiv.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Richard Socher,et al.  Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems , 2019, ACL.

[17]  Philip S. Yu,et al.  Find or Classify? Dual Strategy for Slot-Value Predictions on Multi-Domain Dialog State Tracking , 2019, STARSEM.

[18]  Raghav Gupta,et al.  Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , 2019, AAAI Conference on Artificial Intelligence.

[19]  Anuj Kumar Goyal,et al.  MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines , 2019, LREC.