Pair Me Up: A Web Framework for Crowd-Sourced Spoken Dialogue Collection

We describe and analyze a new web-based spoken dialogue data collection framework. The framework enables the capture of conversational speech from two remote users who converse with each other and play a dialogue game entirely through their web browsers. We report on the substantial improvements in the speed and cost of data capture we have observed with this crowd-sourced paradigm. We also analyze a range of data quality factors by comparing a crowd-sourced data set involving 196 remote users to a smaller but more quality controlled lab-based data set. We focus our comparison on aspects that are especially important in our spoken dialogue research, including audio quality, the effect of communication latency on the interaction, our ability to synchronize the collected data, our ability to collect examples of excellent game play, and the naturalness of the resulting interactions. This analysis illustrates some of the current trade-offs between lab-based and crowd-sourced spoken dialogue data.

[1]  Walter S. Lasecki,et al.  Conversations in the Crowd: Collecting Data for Task-Oriented Dialog Learning , 2013, Proceedings of the AAAI Conference on Human Computation and Crowdsourcing.

[2]  James R. Glass,et al.  A collective data generation method for speech language models , 2010, 2010 IEEE Spoken Language Technology Workshop.

[3]  Yi Zhu,et al.  Collection of user judgments on spoken dialog system with crowdsourcing , 2010, 2010 IEEE Spoken Language Technology Workshop.

[4]  Gabriel Skantze,et al.  Crowdsourcing Street-level Geographic Information Using a Spoken Dialogue System , 2014, SIGDIAL Conference.

[5]  Eric Horvitz,et al.  Crowdsourcing the acquisition of natural language corpora: Methods and observations , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[6]  Thomas C. Schmidt,et al.  Leveraging WebRTC for P2P content distribution in web browsers , 2013, 2013 21st IEEE International Conference on Network Protocols (ICNP).

[7]  Anton Leuski,et al.  Which ASR should I choose for my dialogue system? , 2013, SIGDIAL Conference.

[8]  David DeVault,et al.  A Multimodal Corpus of Rapid Dialogue Games , 2014, LREC.

[9]  Dan Bohus,et al.  Crowdsourcing Language Generation Templates for Dialogue Systems , 2014, INLG.

[10]  Maxine Eskénazi,et al.  Toward better crowdsourced transcription: Transcription of a year of the Let's Go Bus Information System data , 2010, 2010 IEEE Spoken Language Technology Workshop.

[11]  David L. Mills,et al.  Internet Engineering Task Force (ietf) Network Time Protocol Version 4: Protocol and Algorithms Specification , 2010 .