Crowd-Sourced Collection of Task-Oriented Human-Human Dialogues in a Multi-domain Scenario

There is a lack of high-quality corpora for the purposes of training task-oriented, end-to-end dialogue systems. This paper describes a dialogue collection process which used crowd-sourcing and a Wizard-of-Oz set-up to collect written human-human dialogues for a task-oriented, multi-domain scenario. The context is a tourism agency, where users try to select the more desired hotel, restaurant, museum or shop. To respond to users, wizards were assisted by an exploratory system supporting Preference-enriched Faceted Search. An important aspect was the translation of user intent to a number of actions (hard or soft-constraints) by wizards. The main goal was to collect dialogues as realistic as possible between a user and an operator, suitable for training end-to-end dialogue systems. This work describes the experiences made, the options and the decisions taken to minimize the human effort and budget, along with the tools used and developed, and describes in detail the resulting dialogue collection.

[1]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[2]  Alessandro Acquisti,et al.  Beyond the Turk: An Empirical Comparison of Alternative Platforms for Crowdsourcing Online Behavioral Research , 2016 .

[3]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[4]  A. Acquisti,et al.  Beyond the Turk: Alternative Platforms for Crowdsourcing Behavioral Research , 2016 .

[5]  Yannis Tzitzikas,et al.  Hippalus: Preference-enriched Faceted Exploration , 2014, EDBT/ICDT Workshops.

[6]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[7]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[8]  Joelle Pineau,et al.  A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[9]  Matthew Lease,et al.  Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms , 2015 .

[10]  Cecilia Ovesdotter Alm,et al.  An Analysis of Domestic Abuse Discourse on Reddit , 2015, EMNLP.

[11]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[12]  Yannis Tzitzikas,et al.  Interactive Exploration of Multi-Dimensional and Hierarchical Information Spaces with Real-Time Preference Elicitation , 2013, Fundam. Informaticae.

[13]  Hannes Schulz,et al.  Frames: a corpus for adding memory to goal-oriented dialogue systems , 2017, SIGDIAL Conference.

[14]  Alessandro Acquisti,et al.  Beyond the Turk: an Empirical Comparison of Alternative Platforms For Crowdsourcing Online Research , 2015 .

[15]  Jianfeng Gao,et al.  Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems , 2018, ArXiv.

[16]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[17]  Stefan Petrik,et al.  Wizard of Oz Experiments on Speech Dialogue Systems Design and Realisation with a New Integrated Simulation Environment , 2004 .

[18]  Tsung-Hsien Wen,et al.  Neural Belief Tracker: Data-Driven Dialogue State Tracking , 2016, ACL.

[19]  Arne Jönsson,et al.  Wizard of Oz studies: why and how , 1993, IUI '93.

[20]  Joelle Pineau,et al.  Hierarchical Neural Network Generative Models for Movie Dialogues , 2015, ArXiv.

[21]  J. F. Kelley,et al.  An iterative design methodology for user-friendly natural language office information applications , 1984, TOIS.