Guardian: A Crowd-Powered Spoken Dialog System for Web APIs

Natural language dialog is an important and intuitive way for people to access information and services. However, current dialog systems are limited in scope, brittle to the richness of natural language, and expensive to produce. This paper introduces Guardian, a crowdpowered framework that wraps existing Web APIs into immediately usable spoken dialog systems. Guardian takes as input the Web API and desired task, and the crowd determines the parameters necessary to complete it, how to ask for them, and interprets the responses from the API. The system is structured so that, over time, it can learn to take over for the crowd. This hybrid systems approach will help make dialog systems both more general and more robust going forward.

[1]  Lydia B. Chilton,et al.  Seaweed: a web application for designing economic games , 2009, HCOMP '09.

[2]  James F. Allen,et al.  TRIPS: An Integrated Intelligent Problem-Solving Assistant , 1998, AAAI/IAAI.

[3]  Rob Miller,et al.  VizWiz: nearly real-time answers to visual questions , 2010, UIST.

[4]  Stefanie Tomko,et al.  Improving user interaction with spoken dialog systems via shaping , 2005, CHI Extended Abstracts.

[5]  Alexander I. Rudnicky,et al.  Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[6]  Daniel G. Bobrow,et al.  GUS, A Frame-Driven Dialog System , 1986, Artif. Intell..

[8]  Michael S. Bernstein,et al.  Crowds in two seconds: enabling realtime crowd-powered interfaces , 2011, UIST.

[9]  Rohit Kumar,et al.  Conquestâ - An Open-Source Dialog System for Conferences , 2007, HLT-NAACL.

[10]  Gary Geunbae Lee,et al.  Example-based dialog modeling for practical multi-domain dialog system , 2009, Speech Commun..

[11]  Walter S. Lasecki,et al.  Real-time captioning by groups of non-experts , 2012, UIST.

[12]  Walter S. Lasecki,et al.  HiveMind: A Framework for Optimizing Open-Ended Responses From the Crowd , 2012 .

[13]  Matthew Henderson,et al.  Discriminative spoken language understanding using word confusion networks , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[14]  Saul Greenberg,et al.  Prototyping an intelligent agent through Wizard of Oz , 1993, INTERCHI.

[15]  Scott R. Klemmer,et al.  Programming by a sample: rapidly creating web applications with d.mix , 2007, UIST.

[16]  Diane J. Litman,et al.  ITSPOKE: An Intelligent Tutoring Spoken Dialogue System , 2004, NAACL.

[17]  Adrien Treuille,et al.  Predicting protein structures with a multiplayer online game , 2010, Nature.

[18]  Jeffrey Nichols,et al.  Chorus: a crowd-powered conversational assistant , 2013, UIST.

[19]  Jeffrey P. Bigham,et al.  Transcendence: enabling a personal view of the deep web , 2008, IUI '08.

[20]  Wei Xu,et al.  Language modeling for dialog system , 2000, INTERSPEECH.

[21]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[22]  Eric Horvitz,et al.  A computational architecture for conversation , 1999 .

[23]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[24]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.