Crowdsourcing Language Generation Templates for Dialogue Systems

We explore the use of crowdsourcing to generate natural language in spoken dialogue systems. We introduce a methodology to elicit novel templates from the crowd based on a dialogue seed corpus, and investigate the effect that the amount of surrounding dialogue context has on the generation task. Evaluation is performed both with a crowd and with a system developer to assess the naturalness and suitability of the elicited phrases. Results indicate that the crowd is able to provide reasonable and diverse templates within this methodology. More work is necessary before elicited templates can be automatically plugged into the system.

[1]  James R. Glass,et al.  A collective data generation method for speech language models , 2010, 2010 IEEE Spoken Language Technology Workshop.

[2]  Milica Gasic,et al.  Real User Evaluation of Spoken Dialogue Systems Using Amazon Mechanical Turk , 2011, INTERSPEECH.

[3]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[4]  Jeffrey Nichols,et al.  Chorus: a crowd-powered conversational assistant , 2013, UIST.

[5]  Chris Callison-Burch,et al.  Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.

[6]  Olivia Buzek,et al.  Error Driven Paraphrase Annotation using Mechanical Turk , 2010, Mturk@HLT-NAACL.

[7]  Maxine Eskénazi,et al.  Toward better crowdsourced transcription: Transcription of a year of the Let's Go Bus Information System data , 2010, 2010 IEEE Spoken Language Technology Workshop.

[8]  Seungyeop Han,et al.  NLify: lightweight spoken natural language interfaces via exhaustive paraphrasing , 2013, UbiComp.

[9]  Jun Rekimoto,et al.  Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing , 2013, UbiComp 2013.

[10]  Milica Gasic,et al.  Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning , 2010, ACL.

[11]  Benno Stein,et al.  Paraphrase acquisition via crowdsourcing and machine learning , 2013, TIST.

[12]  Eric Horvitz,et al.  Crowdsourcing the acquisition of natural language corpora: Methods and observations , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[13]  Eric Horvitz,et al.  Directions robot: in-the-wild experiences and lessons learned , 2014, AAMAS.

[14]  Eric Horvitz,et al.  Dialog in the open world: platform and applications , 2009, ICMI-MLMI '09.

[15]  Matteo Negri,et al.  Chinese Whispers: Cooperative Paraphrase Acquisition , 2012, LREC.

[16]  William B. Dolan,et al.  Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.

[17]  Wayne H. Ward,et al.  THE CU COMMUNICATOR SYSTEM 1 , 1999 .

[18]  Fredric C. Gey,et al.  Proceedings of LREC , 2010 .

[19]  Alon Lavie,et al.  Exploring Normalization Techniques for Human Judgments of Machine Translation Adequacy Collected Using Amazon Mechanical Turk , 2010, Mturk@HLT-NAACL.

[20]  Roland Reagan THE CU COMMUNICATOR SYSTEM , 1998 .

[21]  Gerold Hintz,et al.  Leveraging Crowdsourcing for Paraphrase Recognition , 2013, LAW@ACL.