CONTEMPORARY MULTIMODAL DATA COLLECTION METHODOLOGY FOR RELIABLE INFERENCE OF AUTHENTIC SURPRISE

The need for intelligent systems that can understand and convey human emotional expression is becoming increasingly prevalent. Unfortunately, most datasets for developing such systems rely on acted or exaggerated emotions, or utilize subjective labels obtained from possibly unreliable sources. This paper reports on an innovative data collection methodology for capturing multimodal human signals of authentic surprise. We introduce two tasks with a facilitator to elicit genuine reactions of surprise while co-collecting data from three human modalities: speech, facial expressions, and galvanic skin response. Our work highlights the methodological potential of biophysical measurement-based validation for enabling reliable inference. A case study is presented which provides baseline results for Random Forest classification. Using features gathered from the three modalities, our baseline system is able to identify surprise instances with approximately 20% absolute increase in accuracy compared to random assignment on a balanced dataset.

[1]  Cigdem Eroglu Erdem,et al.  BAUM-1: A Spontaneous Audio-Visual Face Database of Affective and Mental States , 2017, IEEE Transactions on Affective Computing.

[2]  B. Kryk-Kastovsky Surprise, surprise: The iconicity-conventionality scale of emotions , 1997 .

[3]  Raquel Hervás,et al.  EmoTales: creating a corpus of folk tales with emotional annotations , 2012, Lang. Resour. Evaluation.

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  Mireia Farrús,et al.  Jitter and shimmer measurements for speaker recognition , 2007, INTERSPEECH.

[6]  Gérard Bailly,et al.  A Generative Audio-Visual Prosodic Model for Virtual Actors , 2017, IEEE Computer Graphics and Applications.

[7]  Masayuki Numao,et al.  Towards the Design of Affective Survival Horror Games: An Investigation on Player Affect , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[8]  Björn W. Schuller,et al.  Speech emotion recognition , 2018, Commun. ACM.

[9]  Erik Cambria,et al.  A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.

[10]  Takashi Minato,et al.  Motion Analysis in Vocalized Surprise Expressions and Motion Generation in Android Robots , 2017, IEEE Robotics and Automation Letters.

[11]  Morgan Sonderegger,et al.  Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi , 2017, INTERSPEECH.

[12]  Z. Kövecses Surprise as a conceptual category , 2015 .

[13]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[14]  Ruth B. Grossman,et al.  To Capture a Face: A Novel Technique for the Analysis and Quantification of Facial Expressions in American Sign Language , 2006 .

[15]  Adrian Burns,et al.  SHIMMER™ – A Wireless Sensor Platform for Noninvasive Biomedical Research , 2010, IEEE Sensors Journal.

[16]  Alessandro Gasparetto,et al.  System and method for recognizing human emotion state based on analysis of speech and facial feature extraction; applications to human-robot interaction , 2016, 2016 4th International Conference on Robotics and Mechatronics (ICROM).

[17]  Kun Zhang,et al.  Towards Improving Social Communication Skills With Multimodal Sensory Information , 2014, IEEE Transactions on Industrial Informatics.

[18]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.