On the Influence of an Iterative Affect Annotation Approach on Inter-Observer and Self-Observer Reliability

Affect detection systems require reliable methods to annotate affective data. Typically, two or more observers independently annotate audio-visual affective data. This approach results in inter-observer reliabilities that can be categorized as fair (Cohen's kappas of approximately .40). In an alternative iterative approach, observers independently annotate small amounts of data, discuss their annotations, and annotate a different sample of data. After a pre-determined reliability threshold is reached, the observers independently annotate the remainder of the data. The effectiveness of the iterative approach was tested in an annotation study where pairs of observers annotated affective video data in nine annotate-discuss iterations. Self-annotations were previously collected on the same data. Mixed effects linear regression models indicated that inter-observer agreement increased (unstandardized coefficient B = .031) across iterations, with agreement in the final iteration reflecting a 64 percent improvement over the first iteration. Follow-up analyses indicated that the improvement was nonlinear in that most of the improvement occurred after the first three iterations (B = .043), after which agreement plateaued (B ≈ 0). There was no notable complementary improvement (B ≈ 0) in self-observer agreement, which was considerably lower than observer-observer agreement. Strengths, limitations, and applications of the iterative affective annotation approach are discussed.

[1]  H. John Bernardin,et al.  Strategies in Rater Training , 1981 .

[2]  Wijnand A. IJsselsteijn,et al.  Machines Outperform Laypersons in Recognizing Emotions Elicited by Autobiographical Recollection , 2013, Hum. Comput. Interact..

[3]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[4]  James J. Gross,et al.  Emotion Generation and Emotion Regulation: One or Two Depends on Your Point of View , 2011, Emotion review : journal of the International Society for Research on Emotion.

[5]  Klaus R. Scherer,et al.  A psycho-ethological approach to social signal processing , 2012, Cognitive Processing.

[6]  Ryan S. Baker,et al.  Interaction-Based Affect Detection in Educational Software , 2015 .

[7]  C. Izard Innate and universal facial expressions: evidence from developmental and cross-cultural research. , 1994, Psychological bulletin.

[8]  Björn Schuller,et al.  Being bored? Recognising natural interest by extensive audiovisual integration for real-life application , 2009, Image Vis. Comput..

[9]  J. Krosnick,et al.  Survey research. , 1999, Annual review of psychology.

[10]  Diane J. Litman,et al.  Predicting Student Emotions in Computer-Human Tutoring Dialogues , 2004, ACL.

[11]  Kristen A. Lindquist,et al.  The hundred-year emotion war: are emotions natural kinds or psychological constructions? Comment on Lench, Flores, and Bench (2011). , 2013, Psychological bulletin.

[12]  Daniel McDuff,et al.  Crowdsourcing Techniques for Affective Computing , 2015 .

[13]  P. Ekman,et al.  Facial Expressions of Emotion , 1979 .

[14]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[15]  Arthur C. Graesser,et al.  Emote aloud during learning with AutoTutor: Applying the Facial Action Coding System to cognitive–affective states during learning , 2008 .

[16]  Roddy Cowie,et al.  Tracing Emotion: An Overview , 2012, Int. J. Synth. Emot..

[17]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[18]  Sidney K. D'Mello,et al.  Consistent but modest: a meta-analysis on unimodal and multimodal affect detection accuracies from 30 studies , 2012, ICMI '12.

[19]  Jennifer S. Beer,et al.  Facial expression of emotion. , 2003 .

[20]  Javier R. Movellan,et al.  The Faces of Engagement: Automatic Recognition of Student Engagementfrom Facial Expressions , 2014, IEEE Transactions on Affective Computing.

[21]  D. R. Lehman,et al.  What's wrong with cross-cultural comparisons of subjective Likert scales?: The reference-group effect. , 2002, Journal of personality and social psychology.

[22]  Rafael A. Calvo,et al.  Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications , 2010, IEEE Transactions on Affective Computing.

[23]  P. Ekman An argument for basic emotions , 1992 .

[24]  Areej Alhothali,et al.  Modeling User Affect Using Interaction Events , 2011 .

[25]  A. J. Fridlund Human Facial Expression: An Evolutionary View , 1994 .

[26]  James C. Lester,et al.  Modeling Learner Affect with Theoretically Grounded Dynamic Bayesian Networks , 2011, ACII.

[27]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[28]  D. Bates,et al.  Mixed-Effects Models in S and S-PLUS , 2001 .

[29]  B. Mesquita,et al.  The experience of emotion. , 2007, Annual review of psychology.

[30]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  L. F. Barrett Are Emotions Natural Kinds? , 2006, Perspectives on psychological science : a journal of the Association for Psychological Science.

[32]  Diane J. Litman,et al.  Benefits and challenges of real-time uncertainty detection and adaptation in a spoken dialogue computer tutor , 2011, Speech Commun..

[33]  Sidney K. D'Mello,et al.  What Are You Feeling? Investigating Student Affective States During Expert Human Tutoring Sessions , 2008, Intelligent Tutoring Systems.

[34]  Arthur C. Graesser,et al.  Unimodal and Multimodal Human Perceptionof Naturalistic Non-Basic Affective Statesduring Human-Computer Interactions , 2013, IEEE Transactions on Affective Computing.

[35]  J. Gratch,et al.  The Oxford Handbook of Affective Computing , 2014 .

[36]  J. Russell Core affect and the psychological construction of emotion. , 2003, Psychological review.

[37]  Hillary Anger Elfenbein,et al.  On the universality and cultural specificity of emotion recognition: a meta-analysis. , 2002, Psychological bulletin.

[38]  Scott B. MacKenzie,et al.  Common method biases in behavioral research: a critical review of the literature and recommended remedies. , 2003, The Journal of applied psychology.

[39]  P. Ekman Expression and the Nature of Emotion , 1984 .

[40]  Arthur C. Graesser,et al.  Feeling, Thinking, and Computing with Affect-Aware Learning Technologies , 2015 .

[41]  N. Ambady,et al.  Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. , 1992 .

[42]  A. Freitas-Magalhães Facial Expression of Emotion , 2012 .

[43]  Stuart M. Shieber,et al.  Recognizing Uncertainty in Speech , 2011, EURASIP J. Adv. Signal Process..

[44]  K. Gwet Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters , 2014 .

[45]  Daniel McDuff,et al.  Crowdsourcing Facial Responses to Online Videos , 2012, IEEE Transactions on Affective Computing.

[46]  Angeliki Metallinou,et al.  Annotation and processing of continuous emotional attributes: Challenges and opportunities , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[47]  Mehryar Mohri,et al.  Voice signatures , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[48]  Agneta H. Fischer,et al.  Emotion in Social Relations: Cultural, Group, and Interpersonal Processes , 2004 .

[49]  L. Camras,et al.  Emotional Facial Expressions in Infancy , 2010 .

[50]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[51]  Kevin A Hallgren,et al.  Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. , 2012, Tutorials in quantitative methods for psychology.

[52]  Sidney K. D'Mello,et al.  Monitoring Affect States During Effortful Problem Solving Activities , 2010, Int. J. Artif. Intell. Educ..

[53]  S. Kirby,et al.  Language evolution in the laboratory , 2010, Trends in Cognitive Sciences.

[54]  Timothy D. Wilson,et al.  Telling more than we can know: Verbal reports on mental processes. , 1977 .

[55]  Javier Hernandez,et al.  Mood meter: counting smiles in the wild , 2012, UbiComp.

[56]  J. M. Carroll,et al.  Do facial expressions signal specific emotions? Judging emotion from the face in context. , 1996, Journal of personality and social psychology.

[57]  Debashis Kushary,et al.  Bootstrap Methods and Their Application , 2000, Technometrics.

[58]  Hillary Anger Elfenbein,et al.  Is there an in-group advantage in emotion recognition? , 2002, Psychological bulletin.

[59]  Kurt VanLehn,et al.  The Affective Meta-Tutoring Project: Lessons Learned , 2014, Intelligent Tutoring Systems.

[60]  Klaus Krippendorff,et al.  Answering the Call for a Standard Reliability Measure for Coding Data , 2007 .

[61]  Arthur C. Graesser,et al.  Self Versus Teacher Judgments of Learner Emotions During a Tutoring Session with AutoTutor , 2008, Intelligent Tutoring Systems.

[62]  Nirbhay N. Singh,et al.  Facial Expressions of Emotion , 1998 .

[63]  R. Bakeman,et al.  Detecting Sequential Patterns and Determining Their Reliability With Fallible Observers , 2001 .

[64]  Arthur C. Graesser,et al.  A Time for Emoting: When Affect-Sensitivity Is and Isn't Effective at Promoting Deep Learning , 2010, Intelligent Tutoring Systems.

[65]  W. Ruch Will the real relationship between facial expression and affective experience please stand up: The case of exhilaration , 1995 .

[66]  P. Ekman,et al.  Strong evidence for universals in facial expressions: a reply to Russell's mistaken critique. , 1994, Psychological bulletin.

[67]  J. Russell Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. , 1994, Psychological bulletin.

[68]  C. Izard The Many Meanings/Aspects of Emotion: Definitions, Functions, Activation, and Regulation , 2010 .

[69]  Brandon G. King,et al.  Facial Features for Affective State Detection in Learning Environments , 2007 .

[70]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[71]  Cristina Conati,et al.  Knowledge Elicitation Methods for Affect Modelling in Education , 2013, Int. J. Artif. Intell. Educ..

[72]  Peter Robinson,et al.  Natural affect data — Collection & annotation in a learning context , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[73]  Gary J McKeown,et al.  Modeling continuous self-report measures of perceived emotion using generalized additive mixed models. , 2014, Psychological methods.

[74]  Amy M. Witherspoon,et al.  Detection of Emotions during Learning with AutoTutor , 2006 .

[75]  S. D’Mello A selective meta-analysis on the relative incidence of discrete affective states during learning with technology , 2013 .

[76]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[77]  Heather C. Lench,et al.  Searching for evidence, not a war: reply to Lindquist, Siegel, Quigley, and Barrett (2013). , 2013, Psychological bulletin.