A Comparative Analysis of Crowdsourced Natural Language Corpora for Spoken Dialog Systems

Recent spoken dialog systems have been able to recognize freely spoken user input in restricted domains thanks to statistical methods in the automatic speech recognition. These methods require a high number of natural language utterances to train the speech recognition engine and to assess the quality of the system. Since human speech offers many variants associated with a single intent, a high number of user utterances have to be elicited. Developers are therefore turning to crowdsourcing to collect this data. This paper compares three different methods to elicit multiple utterances for given semantics via crowd sourcing, namely with pictures, with text and with semantic entities. Specifically, we compare the methods with regard to the number of valid data and linguistic variance, whereby a quantitative and qualitative approach is proposed. In our study, the method with text led to a high variance in the utterances and a relatively low rate of invalid data.

[1]  Yi Zhu,et al.  Collection of user judgments on spoken dialog system with crowdsourcing , 2010, 2010 IEEE Spoken Language Technology Workshop.

[2]  Kalina Bontcheva,et al.  Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines , 2014, LREC.

[3]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[4]  Eric Horvitz,et al.  Crowdsourcing the acquisition of natural language corpora: Methods and observations , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[5]  Alimohammad Shahri,et al.  Recommendations on adapting crowdsourcing to problem types , 2015, 2015 IEEE 9th International Conference on Research Challenges in Information Science (RCIS).

[6]  David Suendermann,et al.  Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment , 2013 .

[7]  Benno Stein,et al.  Paraphrase acquisition via crowdsourcing and machine learning , 2013, TIST.

[8]  Mark Hasegawa-Johnson,et al.  Transcribing continuous speech using mismatched crowdsourcing , 2015, INTERSPEECH.

[9]  David DeVault,et al.  Reducing the Cost of Dialogue System Training and Evaluation with Online, Crowd-Sourced Dialogue Data Collection , 2015 .

[10]  Maja Vukovic,et al.  Crowdsourcing for Enterprises , 2009, 2009 Congress on Services - I.

[11]  Hansjörg Hofmann,et al.  Evaluation of Crowdsourced User Input Data for Spoken Dialog Systems , 2015, SIGDIAL Conference.

[12]  James R. Glass,et al.  Collecting Voices from the Cloud , 2010, LREC.

[13]  Niels Ole Bernsen,et al.  Designing interactive speech systems - from first ideas to user testing , 1998 .