Conversations in the Crowd: Collecting Data for Task-Oriented Dialog Learning

A major challenge in developing dialog systems is obtaining realistic data to train the systems for specific domains. We study the opportunity for using crowdsourcing methods to collect dialog datasets. Specifically, we introduce ChatCollect, a system that allows researchers to collect conversations focused around definable tasks from pairs of workers in the crowd. We demonstrate that varied and in-depth dialogs can be collected using this system, then discuss ongoing work on creating a crowd-powered system for parsing semantic frames. We then discuss research opportunities in using this approach to train and improve automated dialog systems in the future.

[1]  Walter S. Lasecki,et al.  Answering visual questions with conversational crowd assistants , 2013, ASSETS.

[2]  Anton Leuski,et al.  From domain specification to virtual humans: an integrated approach to authoring tactical questioning characters , 2008, INTERSPEECH.

[3]  Wayne H. Ward,et al.  THE CU COMMUNICATOR SYSTEM 1 , 1999 .

[4]  Jeffrey Nichols,et al.  Chorus: a crowd-powered conversational assistant , 2013, UIST.

[5]  Chris Callison-Burch,et al.  Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.

[6]  Paul N. Bennett,et al.  Pairwise ranking aggregation in a crowdsourced setting , 2013, WSDM.

[7]  Eric Horvitz,et al.  Crowdsourcing the acquisition of natural language corpora: Methods and observations , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[8]  Walter S. Lasecki,et al.  Real-time captioning by groups of non-experts , 2012, UIST.

[9]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[10]  James R. Glass,et al.  Asgard: A portable architecture for multilingual dialogue systems , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Stanley Peters,et al.  A wizard of oz framework for collecting spoken human-computer dialogs , 2004, INTERSPEECH.

[12]  Ian R. Lane,et al.  Tools for Collecting Speech Corpora via Mechanical-Turk , 2010, Mturk@HLT-NAACL.

[13]  Jeffrey P. Bigham,et al.  VizWiz: nearly real-time answers to visual questions , 2010, W4A.

[14]  V. Aleven,et al.  Rapid Authoring of Intelligent Tutors for Real-World and Experimental Use , 2006, Sixth IEEE International Conference on Advanced Learning Technologies (ICALT'06).

[15]  Eric K. Ringger,et al.  A Robust System for Natural Spoken Dialogue , 1996, ACL.

[16]  Rob Miller,et al.  Real-time crowd control of existing interfaces , 2011, UIST.

[17]  Benno Stein,et al.  Paraphrase acquisition via crowdsourcing and machine learning , 2013, TIST.

[18]  Yasuo Kuniyoshi,et al.  Dialog System Using Real-Time Crowdsourcing and Twitter Large-Scale Corpus , 2012, SIGDIAL Conference.

[19]  Michael S. Bernstein,et al.  Crowds in two seconds: enabling realtime crowd-powered interfaces , 2011, UIST.