A Method for Dataset Creation for Dialogue State Classification in Voice Control Systems for the Internet of Things

In recent years, speech-based interaction became an important method of communication with devices in the Internet of Things (IoT). Voice control interfaces involve all the challenges and difficulties of natural language understanding and human-computer communication. In this paper, we present a methodology to create initial training data for voice-controlled devices which helps to design and track dialogue system states. Using crowdsourcing, in a first step we collect simple commands that users might give to devices. These commands are analyzed and manually classified into 50 user-system interaction scenarios. In a second step, we design a set of potential system states after processing the initial user commands, and crowd workers are asked to provide multi-turn dialogues between a user and the device, which simulate the processes of resolving a system state towards completion. The resulting dataset contains 320 commands and their classification into interaction scenarios for the first step, and 640 multi-turn dialogues for step two, generated given 12 potential system states. Finally, we present a baseline for automatic classification of utterance type and slot types in user commands, which is important for dialogue state detection. The proposed methodology allows collecting dialogues for IoT devices, which cover a variety of system states and interaction patterns.

[1]  Brigitte Meillon,et al.  Design and evaluation of a smart home voice interface for the elderly: acceptability and objection aspects , 2011, Personal and Ubiquitous Computing.

[2]  Sungjin Lee,et al.  Task Lineages: Dialog State Tracking for Flexible Interaction , 2016, SIGDIAL Conference.

[3]  Dilek Z. Hakkani-Tür,et al.  Learning concepts through conversations in spoken dialogue systems , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Joelle Pineau,et al.  Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus , 2017, Dialogue Discourse.

[6]  Alexander Shipilo,et al.  Russian Tagging and Dependency Parsing Models for Stanford CoreNLP Natural Language Toolkit , 2017, KESW.

[7]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.