What if everyone could do it?: a framework for easier spoken dialog system design

While Graphical User Interfaces (GUI) still represent the most common way of operating modern computing technology, Spoken Dialog Systems (SDS) have the potential to offer a more natural and intuitive mode of interaction. Even though some may say that existing speech recognition is neither reliable nor practical, the success of recent product releases such as Apple's Siri or Nuance's Dragon Drive suggests that language-based interaction is increasingly gaining acceptance. Yet, unlike applications for building GUIs, tools and frameworks that support the design, construction and maintenance of dialog systems are rare. A particular challenge of SDS design is the often complex integration of technologies. Systems usually consist of several components (e.g. speech recognition, language understanding, output generation, etc.), all of which require expertise to deploy them in a given application domain. This paper presents work in progress that aims at supporting this integration process. We propose a framework of components and describe how it may be used to prototype and gradually implement a spoken dialog system without requiring extensive domain expertise.

[1]  Francoise Beaufays,et al.  Google Search by Voice: A Case Study , 2010 .

[2]  Oliver Lemon,et al.  Mixture Model POMDPs for Efficient Handling of Uncertainty in Dialogue Management , 2008, ACL.

[3]  Arne Jönsson,et al.  Wizard of Oz studies: why and how , 1993, IUI '93.

[4]  Jingrui He,et al.  Survey and Overview , 2011 .

[5]  Anton Leuski,et al.  Building Effective Question Answering Characters , 2006, SIGDIAL Workshop.

[6]  J. F. Kelley,et al.  An empirical methodology for writing user-friendly natural language computer applications , 1983, CHI '83.

[7]  Gérard Chollet,et al.  Hands-free speech-sound interactions at home , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[8]  Marc Schröder,et al.  The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[9]  Milica Gasic,et al.  Transformation-based learning for semantic parsing , 2009, INTERSPEECH.

[10]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  A. F. Adams,et al.  The Survey , 2021, Dyslexia in Higher Education.

[12]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[13]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[14]  Candace L. Sidner,et al.  Using Collaborative Discourse Theory to Partially Automate Dialogue Tree Authoring , 2012, IVA.

[15]  James Glass,et al.  Research Developments and Directions in Speech Recognition and Understanding, Part 1 , 2009 .

[16]  Steve J. Young,et al.  Spoken language understanding using the Hidden Vector State Model , 2006, Speech Commun..

[17]  Renato De Mori,et al.  Spoken language understanding: a survey , 2007, ASRU.

[18]  Steve J. Young,et al.  The hidden vector state language model , 2005, INTERSPEECH.

[19]  Gary Geunbae Lee,et al.  Robust Dialog Management with N-Best Hypotheses Using Dialog Examples and Agenda , 2008, ACL.

[20]  Steve J. Young,et al.  Semantic processing using the Hidden Vector State model , 2005, Comput. Speech Lang..

[21]  Thierry Artières,et al.  Contextual Hidden Markov Models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Anton Leuski,et al.  How to talk to a hologram , 2006, IUI '06.

[23]  M. A. Anusuya,et al.  Front end analysis of speech recognition: a review , 2011, Int. J. Speech Technol..

[24]  Saturnino Luz,et al.  WebWOZ: a wizard of oz prototyping framework , 2010, EICS '10.

[25]  Marc Schröder,et al.  Multilingual Voice Creation Toolkit for the MARY TTS Platform , 2010, LREC.

[26]  Charles Rich,et al.  Building Task-Based User Interfaces with ANSI/CEA-2018 , 2009, Computer.

[27]  Alexander I. Rudnicky,et al.  Olympus: an open-source framework for conversational spoken language interface research , 2007, HLT-NAACL 2007.

[28]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[29]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[30]  Olivier Galibert,et al.  Ritel: an open-domain, human-computer dialog system , 2005, INTERSPEECH.

[31]  Clive Souter,et al.  Dialogue Management Systems: a Survey and Overview , 1997 .