Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors

While human tutors respond to both what a student says and to how the student says it, most tutorial dialogue systems cannot detect the student emotions and attitudes underlying an utterance. We present an empirical study investigating the feasibility of recognizing student state in two corpora of spoken tutoring dialogues, one with a human tutor, and one with a computer tutor. We first annotate student turns for negative, neutral and positive student states in both corpora. We then automatically extract acoustic–prosodic features from the student speech, and lexical items from the transcribed or recognized speech. We compare the results of machine learning experiments using these features alone, in combination, and with student and task dependent features, to predict student states. We also compare our results across human–human and human–computer spoken tutoring dialogues. Our results show significant improvements in prediction accuracy over relevant baselines, and provide a first step towards enhancing our intelligent tutoring spoken dialogue system to automatically recognize and adapt to student states.

[1]  D. Litman,et al.  Predicting User Reactions to System Error , 2001, ACL.

[2]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[3]  Louis ten Bosch,et al.  Emotions, speech and the ASR framework , 2003, Speech Commun..

[4]  Mei-Yuh Hwang,et al.  The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[5]  Joel A. Michael,et al.  Classifying Student Initiatives and Tutor Responses in Human Keyboard-to-Keyboard Tutoring Sessions , 2002 .

[6]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[7]  Diane J. Litman,et al.  Recognizing emotions from student speech in tutoring dialogues , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[8]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[9]  Shrikanth Narayanan,et al.  Recognition of negative emotions from the speech signal , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[10]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[11]  Valeria Carofiglio,et al.  Emotional Dialogs with an Embodied Agent , 2003, User Modeling.

[12]  Shrikanth S. Narayanan,et al.  Combining acoustic and language information for emotion recognition , 2002, INTERSPEECH.

[13]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[14]  Roger K. Moore Computer Speech and Language , 1986 .

[15]  Barbara Di Eugenio,et al.  Squibs and Discussions: The Kappa Statistic: A Second Look , 2004, CL.

[16]  Rosalind W. Picard,et al.  An affective model of interplay between emotions and learning: reengineering educational pedagogy-building a learning companion , 2001, Proceedings IEEE International Conference on Advanced Learning Technologies.

[17]  Diane J. Litman,et al.  ITSPOKE: An Intelligent Tutoring Spoken Dialogue System , 2004, NAACL.

[18]  BoschLouis ten Emotions, speech and the ASR framework , 2003 .

[19]  Johanna D. Moore,et al.  Learning Features that Predict Cue Usage , 1997, ACL.

[20]  R. Yando,et al.  Selective encoding and retrieval of affectively valent information: two cognitive consequences of children's mood states. , 1982, Journal of personality and social psychology.

[21]  Harry Bunt,et al.  Cooperative Multimodal Communication , 2001, Lecture Notes in Computer Science.

[22]  J. Cassell,et al.  Embodied conversational agents , 2000 .

[23]  Diane J. Litman,et al.  Cue Phrase Classification Using Machine Learning , 1996, J. Artif. Intell. Res..

[24]  Arthur C. Graesser,et al.  Teaching Tactics and Dialog in AutoTutor , 2001 .

[25]  Carolyn Penstein Rosé,et al.  The Architecture of Why2-Atlas: A Coach for Qualitative Physics Essay Writing , 2002, Intelligent Tutoring Systems.

[26]  Diane J. Litman,et al.  Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources , 2004, NAACL.

[27]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[28]  Chris Messom,et al.  Machine Vision for an Intelligent Tutor , 2003 .

[29]  Bettina Seipp Anxiety and academic performance: A meta-analysis of findings , 1991 .

[30]  Arthur C. Graesser,et al.  Intelligent Tutoring Systems with Conversational Dialogue , 2001, AI Mag..

[31]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[32]  Elizabeth Owen Bratt,et al.  A Scalable , Reusable Spoken Conversational Tutor : SCoT 1 , 2003 .

[33]  Kurt VanLehn,et al.  Discourse Processing for Explanatory Essays in Tutorial Applications , 2002, SIGDIAL Workshop.

[34]  Cristina Conati,et al.  A Study on Using Biometric Sensors for Monitoring User Emotions in Educational Games , 2003 .

[35]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[36]  Shrikanth Narayanan Towards modeling user behavior in human-machine interactions: Effect of Errors and Emotions , 2002 .

[37]  Jennifer Healey,et al.  Toward Machine Emotional Intelligence: Analysis of Affective Physiological State , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Diane J. Litman,et al.  Dialogue-Learning Correlations in Spoken Dialogue Tutoring , 2005, AIED.

[39]  Martha W. Evens,et al.  CIRCSIM-Tutor: An Intelligent Tutoring System Using Natural Language Dialogue , 1997, ANLP.

[40]  Sylvie J. L. Mozziconacci,et al.  Modeling Emotion and Attitude in Speech by Means of Perceptually Based Parameter Values , 2001, User Modeling and User-Adapted Interaction.

[41]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[42]  Carolyn Penstein Rosé,et al.  Spoken Versus Typed Human and Computer Dialogue Tutoring , 2006, Int. J. Artif. Intell. Educ..

[43]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[44]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[45]  Jack Mostow,et al.  Evaluating tutors that listen: an overview of project LISTEN , 2001 .

[46]  Diane J. Litman,et al.  Predicting Student Emotions in Computer-Human Tutoring Dialogues , 2004, ACL.

[47]  James C. Lester,et al.  The Case for Social Agency in Computer-Based Teaching: Do Students Learn More Deeply When They Interact With Animated Pedagogical Agents? , 2001 .

[48]  Vincent Aleven,et al.  Towards Tutorial Dialog to Support Self- Explanation: Adding Natural Language Understanding to a Cognitive Tutor * , 2001 .

[49]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[50]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[51]  Jerome Kagan,et al.  Emotions, cognition, and behavior , 1988 .

[52]  K. Fischer,et al.  DESPERATELY SEEKING EMOTIONS OR: ACTORS, WIZARDS, AND HUMAN BEINGS , 2000 .

[53]  L. Lamel,et al.  Emotion detection in task-oriented spoken dialogues , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[54]  Alex Waibel,et al.  Detecting Emotions in Speech , 1998 .

[55]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[56]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[57]  Diane J. Litman,et al.  Annotating Student Emotional States in Spoken Tutoring Dialogues , 2004, SIGDIAL Workshop.

[58]  Tai Yu Lin,et al.  Smart Machines in Education , 2002, J. Educ. Technol. Soc..

[59]  S. Argamon,et al.  Hedged Responses and Expressions of Affect in Human/Human and Human/Computer Tutorial Interactions , 2004 .

[60]  Mehryar Mohri,et al.  Voice signatures , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[61]  Jack Mostow,et al.  Adding Human-Provided Emotional Scaffolding to an Automated Reading Tutor That Listens Increases Student Persistence , 2002, Intelligent Tutoring Systems.

[62]  M. Ford,et al.  Affective States, Expressive Behavior, and Learning in Children. , 1979 .

[63]  M. Chi,et al.  Eliciting Self‐Explanations Improves Understanding , 1994 .

[64]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[65]  Richard Potts,et al.  Children's emotions and memory for affective narrative content , 1986 .

[66]  Rebecca Hwa,et al.  Co-training for Predicting Emotions with Spoken Dialogue Data , 2004, ACL.

[67]  Julia Hirschberg,et al.  Classifying subject ratings of emotional speech using acoustic features , 2003, INTERSPEECH.

[68]  Kurt VanLehn,et al.  Interactive Conceptual Tutoring in Atlas-Andes , 2002 .

[69]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[70]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[71]  Kurt VanLehn,et al.  Combining Competing Language Understanding Approaches in an Intelligent Tutoring System , 2004, Intelligent Tutoring Systems.

[72]  W. Johnson,et al.  Task-oriented collaboration with embodied agents in virtual worlds , 2001 .

[73]  康焱 Cambridge University , 1900, Nature.

[74]  Claus Zinn,et al.  A 3-Tier Planning Architecture for Managing Tutorial Dialogue , 2002, Intelligent Tutoring Systems.

[75]  K. VanLehn,et al.  Abductive Theorem Proving for Analyzing Student Explanations , 2003 .