ARBEE: Towards Automated Recognition of Bodily Expression of Emotion in the Wild

Humans are arguably innately prepared to comprehend others’ emotional expressions from subtle body movements. If robots or computers can be empowered with this capability, a number of robotic applications become possible. Automatically recognizing human bodily expression in unconstrained situations, however, is daunting given the incomplete understanding of the relationship between emotional expressions and body movements. The current research, as a multidisciplinary effort among computer and information sciences, psychology, and statistics, proposes a scalable and reliable crowdsourcing approach for collecting in-the-wild perceived emotion data for computers to learn to recognize body languages of humans. To accomplish this task, a large and growing annotated dataset with 9876 video clips of body movements and 13,239 human characters, named Body Language Dataset (BoLD), has been created. Comprehensive statistical analysis of the dataset revealed many interesting insights. A system to model the emotional expressions based on bodily movements, named Automated Recognition of Bodily Expression of Emotion (ARBEE), has also been developed and evaluated. Our analysis shows the effectiveness of Laban Movement Analysis (LMA) features in characterizing arousal, and our experiments using LMA features further demonstrate computability of bodily expression. We report and compare results of several other baseline methods which were developed for action recognition based on two different modalities, body skeleton and raw image. The dataset and findings presented in this work will likely serve as a launchpad for future discoveries in body language understanding that will enable future robots to interact and collaborate more effectively with humans.

[1]  Sergio Escalera,et al.  Survey on Emotional Body Gesture Recognition , 2018, IEEE Transactions on Affective Computing.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Hatice Gunes,et al.  Affect recognition from face and body: early fusion vs. late fusion , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[4]  B. Gelder Towards the neurobiology of emotional body language , 2006, Nature Reviews Neuroscience.

[5]  Àgata Lapedriza,et al.  Emotion Recognition in Context , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Luc Van Gool,et al.  Recognizing emotions expressed by body pose: A biologically inspired neural model , 2008, Neural Networks.

[7]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  K. Scherer,et al.  Bodily expression of emotion , 2009 .

[9]  H. Meeren,et al.  Rapid perceptual integration of facial expression and emotional body language. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[11]  James Ze Wang,et al.  On shape and the computability of emotions , 2012, ACM Multimedia.

[12]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[14]  Leonard Carmichael,et al.  A Study of the Judgment of Manual Expression as Presented in Still and Motion Pictures , 1937 .

[15]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Andrea Kleinsmith,et al.  Cross-cultural differences in recognizing affect from body posture , 2006, Interact. Comput..

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Marina Krakovsky Artificial (emotional) intelligence , 2018, Commun. ACM.

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  Hatice Gunes,et al.  Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space , 2011, IEEE Transactions on Affective Computing.

[21]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[23]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[24]  Aleix M. Martínez,et al.  EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[26]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[27]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[28]  Andrea Kleinsmith,et al.  Affective Body Expression Perception and Recognition: A Survey , 2013, IEEE Transactions on Affective Computing.

[29]  Ivan Laptev,et al.  Efficient Feature Extraction, Encoding, and Classification for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Feng Xu,et al.  Microexpression Identification and Categorization Using a Facial Dynamics Map , 2017, IEEE Transactions on Affective Computing.

[31]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[32]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[33]  P. Ekman Facial expression and emotion. , 1993, The American psychologist.

[34]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[35]  A. Mehrabian Basic dimensions for a general psychological theory : implications for personality, social, environmental, and developmental studies , 1980 .

[36]  Anthony Steed,et al.  Automatic Recognition of Non-Acted Affective Postures , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[37]  A. Mehrabian Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in Temperament , 1996 .

[38]  James Zijun Wang,et al.  An investigation into three visual characteristics of complex scenes that evoke human emotion , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).

[39]  Hatice Gunes,et al.  Bi-modal emotion recognition from expressive face and body gestures , 2007, J. Netw. Comput. Appl..

[40]  P. Ekman Are there basic emotions? , 1992, Psychological review.

[41]  Daniel Cohen-Or,et al.  Emotion control of unstructured dance movements , 2017, Symposium on Computer Animation.

[42]  Andreas Aristidou,et al.  Emotion Analysis and Classification: Understanding the Performers' Emotions Using the LMA Entities , 2015, Comput. Graph. Forum.

[43]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[44]  Nancy Wilkins-Diehr,et al.  XSEDE: Accelerating Scientific Discovery , 2014, Computing in Science & Engineering.

[45]  David Picard,et al.  2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Pietro Perona,et al.  Benchmarking and Error Diagnosis in Multi-instance Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Cordelia Schmid,et al.  AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[49]  Areti Chouchourelou,et al.  Seeing Human Movement as Inherently Social , 2010 .

[50]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[51]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[52]  R. Laban,et al.  The mastery of movement , 1950 .

[53]  Maja Pantic,et al.  Discriminative Shared Gaussian Processes for Multiview and View-Invariant Facial Expression Recognition , 2015, IEEE Transactions on Image Processing.

[54]  Y. Trope,et al.  Body Cues, Not Facial Expressions, Discriminate Between Intense Positive and Negative Emotions , 2012, Science.

[55]  Esther Thorson,et al.  Stereotypical Portrayals of Emotionality in News Photos , 2007 .

[56]  Reginald B. Adams,et al.  Probabilistic Multigraph Modeling for Improving the Quality of Crowdsourced Affective Data , 2017, IEEE Transactions on Affective Computing.

[57]  K. Scherer,et al.  Emotion expression in body action and posture. , 2012, Emotion.

[58]  Michelle Karg,et al.  Body Movements for Affective Expression: A Survey of Automatic Recognition and Generation , 2013, IEEE Transactions on Affective Computing.

[59]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[60]  Akio Wakabayashi,et al.  Development of short forms of the Empathy Quotient (EQ-Short) and the Systemizing Quotient (SQ-Short) , 2006 .

[61]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Daniel Gatica-Perez,et al.  The YouTube Lens: Crowdsourced Personality Impressions and Audiovisual Analysis of Vlogs , 2013, IEEE Transactions on Multimedia.

[63]  P. Ekman,et al.  A new pan-cultural facial expression of emotion , 1986 .

[64]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[65]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[66]  Juergen Gall,et al.  PoseTrack: Joint Multi-person Pose Estimation and Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  K. Gwet Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters , 2014 .