A Multi-Modal Approach to Sensing Human Emotion

We are witnessing a revolution in body area sensing, with applications ranging from biometric-based security to personalized healthcare to sports performance training, among numerous others. The key application driver has been the emergence of wireless and/or contact-free technology for sensing human physiology, motion, and posture. We posit an analogous revolution enabled by advancements in sensing of human emotion. The applications are similarly diverse, spanning social skills education, business intelligence, monitoring of doctor-patient dynamics, and numerous others. This paper explores the sensing foundations necessary to achieve reliable detection and classification of human emotions. The approach combines sensing of speech characteristics, natural language processing, facial landmark monitoring, and machine learning.

[1]  Feng Xu,et al.  Microexpression Identification and Categorization Using a Facial Dynamics Map , 2017, IEEE Transactions on Affective Computing.

[2]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Stefanos Zafeiriou,et al.  300 Faces In-The-Wild Challenge: database and results , 2016, Image Vis. Comput..

[4]  Stefanos Zafeiriou,et al.  A Semi-automatic Methodology for Facial Landmark Annotation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[5]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[6]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[7]  Rached Tourki,et al.  Human detection based on integral Histograms of Oriented Gradients and SVM , 2011, 2011 International Conference on Communications, Computing and Control Applications (CCCA).

[8]  Bayya Yegnanarayana,et al.  Epoch extraction from emotional speech using single frequency filtering approach , 2017, Speech Commun..

[9]  Albert A. Rizzo,et al.  Self-Reported Symptoms of Depression and PTSD Are Associated with Reduced Vowel Space in Screening Interviews , 2016, IEEE Transactions on Affective Computing.

[10]  Li-Minn Ang,et al.  A Combined Rule-Based & Machine Learning Audio-Visual Emotion Recognition Approach , 2018, IEEE Transactions on Affective Computing.

[11]  O. Amir,et al.  Examining in-session expressions of emotions with speech/vocal acoustic measures: An introductory guide , 2013, Psychotherapy research : journal of the Society for Psychotherapy Research.

[12]  S. R. Livingstone,et al.  The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.

[13]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[14]  P. Ekman Unmasking The Face , 1975 .