A Theoretical Framework for a User-Centered Spoken Dialog Manager

Dialog strategies have long since been handcrafted by dialog experts. Only within the last decade, research has moved to data-driven methods leading to statistical models. But still, most dialog systems make use solely of the spoken words and their semantics, although speech signals reveal much more about the speaker, e.g. its age, gender, emotional state, etc. Using this speaker state information - along with the semantics - can be a promising way of moving dialog systems towards better performance whilst making them more natural at the same time. Partially Observable Markov Decision Processes (POMDPs), a state-of-the-art statistical modeling method, offer an easy and unified way of integrating speaker state information into dialog systems. In this contribution we present our ongoing research on combining a POMDP-based dialog manager with speaker state information.

[1]  Tim Polzehl,et al.  The Influence of the Utterance Length on the Recognition of Aged Voices , 2010, LREC.

[2]  Alexander I. Rudnicky,et al.  Olympus: an open-source framework for conversational spoken language interface research , 2007, HLT-NAACL 2007.

[3]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[4]  Hartwig Holzapfel A dialogue manager for multimodal human-robot interaction and learning of a humanoid robot , 2008, Ind. Robot.

[5]  Steve J. Young,et al.  Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems , 2010, Comput. Speech Lang..

[6]  Milica Gasic,et al.  The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management , 2010, Comput. Speech Lang..

[7]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[8]  Tobias Heinroth,et al.  Spoken Interaction within the Computed World: Evaluation of a Multitasking Adaptive Spoken Dialogue System , 2011, 2011 IEEE 35th Annual Computer Software and Applications Conference.

[9]  Staffan Larsson,et al.  Information state and dialogue management in the TRINDI dialogue move engine toolkit , 2000, Natural Language Engineering.

[10]  Gregor Bertrand,et al.  Towards Emotion, Age- and Gender-Aware VoiceXML Applications , 2009, Intelligent Environments.

[11]  Tim Polzehl,et al.  Salient Features for Anger Recognition in German and English IVR Portals , 2011, IWSDS.

[12]  Roger K. Moore Computer Speech and Language , 1986 .