STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits

We present a novel classifier network called STEP, to classify perceived human emotion from gaits, based on a Spatial Temporal Graph Convolutional Network (ST-GCN) architecture. Given an RGB video of an individual walking, our formulation implicitly exploits the gait features to classify the emotional state of the human into one of four emotions: happy, sad, angry, or neutral. We use hundreds of annotated real-world gait videos and augment them with thousands of annotated synthetic gaits generated using a novel generative network called STEP-Gen, built on an ST-GCN based Conditional Variational Autoencoder (CVAE). We incorporate a novel push-pull regularization loss in the CVAE formulation of STEP-Gen to generate realistic gaits and improve the classification accuracy of STEP. We also release a novel dataset (E-Gait), which consists of $2,177$ human gaits annotated with perceived emotions along with thousands of synthetic gaits. In practice, STEP can learn the affective features and exhibits classification accuracy of 89% on E-Gait, which is 14 - 30% more accurate over prior methods.

[1]  C. Zahn-Waxler,et al.  Prediction of externalizing behavior problems from early to middle childhood: The role of parental socialization and emotion expression , 2000, Development and Psychopathology.

[2]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[3]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Vidhyasaharan Sethu,et al.  Gaussian Process Regression for Continuous Emotion Recognition with Global Temporal Invariance , 2017, AffComp@IJCAI.

[6]  H. Meeren,et al.  Rapid perceptual integration of facial expression and emotional body language. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Richard P. Wildes,et al.  Spatiotemporal Residual Networks for Video Action Recognition , 2016, NIPS.

[8]  Aleix M. Martínez,et al.  EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Lisa Feldman Barrett,et al.  Handbook of Research Methods in Social and Personality Psychology: Inducing and Measuring Emotion and Affect , 2014 .

[10]  Alberto Del Bimbo,et al.  Emotion Recognition by Body Movement Representation on the Manifold of Symmetric Positive Definite Matrices , 2017, ICIAP.

[11]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[12]  Michelle Karg,et al.  Recognition of Affect Based on Gait Patterns , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  F. Pollick,et al.  A motion capture library for the study of identity, gender, and emotion perception from biological motion , 2006, Behavior research methods.

[14]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[15]  Matthew Hutson New software can track many individuals in a crowd , 2017 .

[16]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Changsheng Xu,et al.  Joint Pose and Expression Modeling for Facial Expression Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Nikolaus F. Troje,et al.  View-independent person identification from human gait , 2005, Neurocomputing.

[20]  Zhe Wang,et al.  Pose Guided Human Video Generation , 2018, ECCV.

[21]  Jan Kautz,et al.  Video-to-Video Synthesis , 2018, NeurIPS.

[22]  Dinesh Manocha,et al.  Motion recognition of self and others on realistic 3D avatars , 2017, Comput. Animat. Virtual Worlds.

[23]  Dinesh Manocha,et al.  Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping , 2019, ECCV.

[24]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[25]  Chi-Keung Tang,et al.  Deep Video Generation, Prediction and Completion of Human Action Sequences , 2017, ECCV.

[26]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Lijun Yin,et al.  Facial Expression Recognition by De-expression Residue Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Andrew Zisserman,et al.  Emotion Recognition in Speech using Cross-Modal Transfer in the Wild , 2018, ACM Multimedia.

[29]  Abhishek Sharma,et al.  Learning 3D Human Pose from Structure and Motion , 2017, ECCV.

[30]  K. S. Venkatesh,et al.  Emotion recognition from geometric facial features using self-organizing map , 2014, Pattern Recognit..

[31]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Joan Y. Chiao,et al.  Eye movements during emotion recognition in faces. , 2014, Journal of vision.

[33]  J. Fernández-Dols,et al.  Expression of Emotion Versus Expressions of Emotions , 1995 .

[34]  Taku Komura,et al.  A Recurrent Variational Autoencoder for Human Motion Synthesis , 2017, BMVC.

[35]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  N. Troje,et al.  Embodiment of Sadness and Depression—Gait Patterns Associated With Dysphoric Mood , 2009, Psychosomatic medicine.

[37]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[38]  P. Ekman,et al.  Head and body cues in the judgment of emotion: a reformulation. , 1967, Perceptual and motor skills.

[39]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  William H. Hsu,et al.  Arousal Detection for Biometric Data in Built Environments using Machine Learning , 2017, AffComp@IJCAI.

[41]  Lorenzo Torresani,et al.  Detect-and-Track: Efficient Pose Estimation in Videos , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Hichem Sahli,et al.  Adaptive Real-Time Emotion Recognition from Body Movements , 2016, TIIS.

[45]  P. Ekman Facial expression and emotion. , 1993, The American psychologist.

[46]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[47]  Tingting Xu,et al.  The Autonomous City Explorer: Towards Natural Human-Robot Interaction in Urban Environments , 2009, Int. J. Soc. Robotics.

[48]  J. Montepare,et al.  The identification of emotions from gait information , 1987 .

[49]  Crenn Arthur,et al.  Body expression recognition from animated 3D skeleton , 2016, 2016 International Conference on 3D Imaging (IC3D).

[50]  Pan Hui,et al.  Emotion Recognition through Gait on Mobile Devices , 2018, 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[51]  Dinesh Manocha,et al.  Identifying Emotions from Walking using Affective and Deep Features , 2019, ArXiv.

[52]  Andrea Kleinsmith,et al.  Affective Body Expression Perception and Recognition: A Survey , 2013, IEEE Transactions on Affective Computing.

[53]  M. Kalaiselvi Geetha,et al.  Automatic Human Emotion Recognition in Surveillance Video , 2017 .

[54]  Gentiane Venture,et al.  Recognizing Emotions Conveyed by Human Gait , 2014, International Journal of Social Robotics.

[55]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[56]  Fadel Adib,et al.  Emotion recognition using wireless signals , 2016, MobiCom.

[57]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[58]  Tieniu Tan,et al.  Automatic gait recognition based on statistical shape analysis , 2003, IEEE Trans. Image Process..

[59]  Björn W. Schuller,et al.  Semisupervised Autoencoders for Speech Emotion Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[60]  P. Mythili,et al.  Prosodic feature based speech emotion recognition at segmental and supra segmental levels , 2015, 2015 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES).

[61]  Ingrid Fadelli Identifying perceived emotions from people's walking style , 2020 .