Intrinsic and Extrinsic Evaluation of an Automatic User Disengagement Detector for an Uncertainty-Adaptive Spoken Dialogue System

We present a model for detecting user disengagement during spoken dialogue interactions. Intrinsic evaluation of our model (i.e., with respect to a gold standard) yields results on par with prior work. However, since our goal is immediate implementation in a system that already detects and adapts to user uncertainty, we go further than prior work and present an extrinsic evaluation of our model (i.e., with respect to the real-world task). Correlation analyses show crucially that our automatic disengagement labels correlate with system performance in the same way as the gold standard (manual) labels, while regression analyses show that detecting user disengagement adds value over and above detecting only user uncertainty when modeling performance. Our results suggest that automatically detecting and adapting to user disengagement has the potential to significantly improve performance even in the presence of noise, when compared with only adapting to one affective state or ignoring affect entirely.

[1]  Abeer Alwan,et al.  A System for Technology Based Assessment of Language and Literacy in Young Children: the Role of Multiple Information Sources , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[2]  Diane J. Litman,et al.  A user modeling-based performance analysis of a wizarded uncertainty-adaptive dialogue system corpus , 2009, INTERSPEECH.

[3]  Nicole Novielli,et al.  Attitude Display in Dialogue Patterns , 2008 .

[4]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[5]  Diane J. Litman,et al.  Benefits and challenges of real-time uncertainty detection and adaptation in a spoken dialogue computer tutor , 2011, Speech Commun..

[6]  Sidney K. D'Mello,et al.  What Are You Feeling? Investigating Student Affective States During Expert Human Tutoring Sessions , 2008, Intelligent Tutoring Systems.

[7]  S. Marsella,et al.  Assessing the validity of appraisal-based models of emotion , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[8]  Diane J. Litman,et al.  Spoken Tutorial Dialogue and the Feeling of Another’s Knowing , 2009, SIGDIAL Conference.

[9]  Eric Horvitz,et al.  Models for Multiparty Engagement in Open-World Dialog , 2009, SIGDIAL Conference.

[10]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[11]  Roger K. Moore Computer Speech and Language , 1986 .

[12]  Gregory A. Sanders,et al.  DARPA communicator: cross-system results for the 2001 evaluation , 2002, INTERSPEECH.

[13]  Carolyn Penstein Rosé,et al.  The Architecture of Why2-Atlas: A Coach for Qualitative Physics Essay Writing , 2002, Intelligent Tutoring Systems.

[14]  Ning Wang,et al.  The politeness effect: Pedagogical agents and learning outcomes , 2008, Int. J. Hum. Comput. Stud..

[15]  Arthur C. Graesser,et al.  A Time for Emoting: When Affect-Sensitivity Is and Isn't Effective at Promoting Deep Learning , 2010, Intelligent Tutoring Systems.

[16]  P. Robinson,et al.  Natural Affect Data: Collection and Annotation , 2011 .

[17]  Candace L. Sidner,et al.  An Architecture for Engagement in Collaborative Conversations between a Robot and Humans , 2003 .

[18]  Laurence Devillers,et al.  Annotation and detection of blended emotions in real human-human dialogs recorded in a call center , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[19]  Diane J. Litman,et al.  The relative impact of student affect on performance models in a spoken dialogue tutoring system , 2008, User Modeling and User-Adapted Interaction.

[20]  Joanna Drummond,et al.  Examining the Impacts of Dialogue Content and System Automation on Affect Models in a Spoken Tutorial Dialogue System , 2011, SIGDIAL Conference.

[21]  Diane J. Litman,et al.  Discourse Structure and Performance Analysis: Beyond the Correlation , 2009, SIGDIAL Conference.

[22]  Brady Clark,et al.  Responding to Student Uncertainty in Spoken Tutorial Dialogue Systems , 2006, Int. J. Artif. Intell. Educ..

[23]  Mehryar Mohri,et al.  Voice signatures , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[24]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[25]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[26]  Tim Paek,et al.  Accommodating explicit user expressions of uncertainty in voice search or something like that , 2008, INTERSPEECH.

[27]  Arthur C. Graesser,et al.  Automatic detection of learner’s affect from conversational cues , 2008, User Modeling and User-Adapted Interaction.

[28]  Diane J. Litman,et al.  When Does Disengagement Correlate with Learning in Spoken Dialog Computer Tutoring? , 2011, AIED.

[29]  Elmar Nöth,et al.  Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech , 2008, User Modeling and User-Adapted Interaction.

[30]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[31]  Mei-Yuh Hwang,et al.  The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[32]  Ashish Kapoor,et al.  Multimodal affect recognition in learning environments , 2005, ACM Multimedia.

[33]  Manolis Mavrikis,et al.  Diagnosing and acting on student affect: the tutor’s perspective , 2008, User Modeling and User-Adapted Interaction.

[34]  Stuart M. Shieber,et al.  Recognizing Uncertainty in Speech , 2011, EURASIP J. Adv. Signal Process..

[35]  Mitsuru Ishizuka,et al.  THE EMPATHIC COMPANION: A CHARACTER-BASED INTERFACE THAT ADDRESSES USERS' AFFECTIVE STATES , 2005, Appl. Artif. Intell..

[36]  Cristina Conati,et al.  Empirically building and evaluating a probabilistic model of user affect , 2009, User Modeling and User-Adapted Interaction.

[37]  Nigel Ward,et al.  Responding to subtle, fleeting changes in the user's internal state , 2001, CHI.

[38]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[39]  James C. Lester,et al.  Modeling self-efficacy in intelligent tutoring systems: An inductive approach , 2008, User Modeling and User-Adapted Interaction.

[40]  A. L. Baylor,et al.  The Effects of Pedagogical Agent Voice and Animation on Learning, Motivation and Perceived Persona , 2003 .

[41]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[42]  Jack Mostow,et al.  Experimentally augmenting an intelligent tutoring system with human-supplied capabilities: adding human-provided emotional scaffolding to an automated reading tutor that listens , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[43]  Carolyn Penstein Rosé,et al.  Tools for Authoring a Dialogue Agent that Participates in Learning Studies , 2007, AIED.

[44]  Laurence Devillers,et al.  Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs , 2006, INTERSPEECH.

[45]  Julia Hirschberg,et al.  Detecting Levels of Interest from Spoken Dialog with Multistream Prediction Feedback and Similarity Based Hierarchical Fusion Learning , 2011, SIGDIAL Conference.

[46]  Diane J. Litman,et al.  Annotating Disengagement for Spoken Dialogue Computer Tutoring , 2011 .

[47]  Rui Xia,et al.  Level of interest sensing in spoken dialog using multi-level fusion of acoustic and lexical evidence , 2010, INTERSPEECH.

[48]  Jonathan Klein,et al.  This computer responds to user frustration: Theory, design, and results , 2002, Interact. Comput..

[49]  Rosalind W. Picard,et al.  Embedded Empathy in Continuous, Interactive Health Assessment , 2005 .

[50]  Johanna D. Moore,et al.  Exploring User Satisfaction in a Tutorial Dialogue System , 2011, SIGDIAL Conference.

[51]  Björn Schuller,et al.  Being bored? Recognising natural interest by extensive audiovisual integration for real-life application , 2009, Image Vis. Comput..