论文信息 - STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits

STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits

We present a novel classifier network called STEP, to classify perceived human emotion from gaits, based on a Spatial Temporal Graph Convolutional Network (ST-GCN) architecture. Given an RGB video of an individual walking, our formulation implicitly exploits the gait features to classify the emotional state of the human into one of four emotions: happy, sad, angry, or neutral. We use hundreds of annotated real-world gait videos and augment them with thousands of annotated synthetic gaits generated using a novel generative network called STEP-Gen, built on an ST-GCN based Conditional Variational Autoencoder (CVAE). We incorporate a novel push-pull regularization loss in the CVAE formulation of STEP-Gen to generate realistic gaits and improve the classification accuracy of STEP. We also release a novel dataset (E-Gait), which consists of $2,177$ human gaits annotated with perceived emotions along with thousands of synthetic gaits. In practice, STEP can learn the affective features and exhibits classification accuracy of 89% on E-Gait, which is 14 - 30% more accurate over prior methods.

[1] C. Zahn-Waxler,et al. Prediction of externalizing behavior problems from early to middle childhood: The role of parental socialization and emotion expression , 2000, Development and Psychopathology.

[2] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[3] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5] Vidhyasaharan Sethu,et al. Gaussian Process Regression for Continuous Emotion Recognition with Global Temporal Invariance , 2017, AffComp@IJCAI.

[6] H. Meeren,et al. Rapid perceptual integration of facial expression and emotional body language. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7] Richard P. Wildes,et al. Spatiotemporal Residual Networks for Video Action Recognition , 2016, NIPS.

[8] Aleix M. Martínez,et al. EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Lisa Feldman Barrett,et al. Handbook of Research Methods in Social and Personality Psychology: Inducing and Measuring Emotion and Affect , 2014 .

[10] Alberto Del Bimbo,et al. Emotion Recognition by Body Movement Representation on the Manifold of Symmetric Positive Definite Matrices , 2017, ICIAP.

[11] Dahua Lin,et al. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[12] Michelle Karg,et al. Recognition of Affect Based on Gait Patterns , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13] F. Pollick,et al. A motion capture library for the study of identity, gender, and emotion perception from biological motion , 2006, Behavior research methods.

[14] Joan Bruna,et al. Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[15] Matthew Hutson. New software can track many individuals in a crowd , 2017 .

[16] Yuanliu Liu,et al. Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[17] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18] Changsheng Xu,et al. Joint Pose and Expression Modeling for Facial Expression Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Nikolaus F. Troje,et al. View-independent person identification from human gait , 2005, Neurocomputing.

[20] Zhe Wang,et al. Pose Guided Human Video Generation , 2018, ECCV.

[21] Jan Kautz,et al. Video-to-Video Synthesis , 2018, NeurIPS.

[22] Dinesh Manocha,et al. Motion recognition of self and others on realistic 3D avatars , 2017, Comput. Animat. Virtual Worlds.

[23] Dinesh Manocha,et al. Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping , 2019, ECCV.

[24] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .

[25] Chi-Keung Tang,et al. Deep Video Generation, Prediction and Completion of Human Action Sequences , 2017, ECCV.

[26] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Lijun Yin,et al. Facial Expression Recognition by De-expression Residue Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28] Andrew Zisserman,et al. Emotion Recognition in Speech using Cross-Modal Transfer in the Wild , 2018, ACM Multimedia.

[29] Abhishek Sharma,et al. Learning 3D Human Pose from Structure and Motion , 2017, ECCV.

[30] K. S. Venkatesh,et al. Emotion recognition from geometric facial features using self-organizing map , 2014, Pattern Recognit..

[31] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Joan Y. Chiao,et al. Eye movements during emotion recognition in faces. , 2014, Journal of vision.

[33] J. Fernández-Dols,et al. Expression of Emotion Versus Expressions of Emotions , 1995 .

[34] Taku Komura,et al. A Recurrent Variational Autoencoder for Human Motion Synthesis , 2017, BMVC.

[35] Gregory D. Hager,et al. Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] N. Troje,et al. Embodiment of Sadness and Depression—Gait Patterns Associated With Dysphoric Mood , 2009, Psychosomatic medicine.

[37] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[38] P. Ekman,et al. Head and body cues in the judgment of emotion: a reformulation. , 1967, Perceptual and motor skills.

[39] Cristian Sminchisescu,et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40] William H. Hsu,et al. Arousal Detection for Biometric Data in Built Environments using Machine Learning , 2017, AffComp@IJCAI.

[41] Lorenzo Torresani,et al. Detect-and-Track: Efficient Pose Estimation in Videos , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42] Jan Kautz,et al. MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] S. Umeyama,et al. Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[44] Hichem Sahli,et al. Adaptive Real-Time Emotion Recognition from Body Movements , 2016, TIIS.

[45] P. Ekman. Facial expression and emotion. , 1993, The American psychologist.

[46] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[47] Tingting Xu,et al. The Autonomous City Explorer: Towards Natural Human-Robot Interaction in Urban Environments , 2009, Int. J. Soc. Robotics.

[48] J. Montepare,et al. The identification of emotions from gait information , 1987 .

[49] Crenn Arthur,et al. Body expression recognition from animated 3D skeleton , 2016, 2016 International Conference on 3D Imaging (IC3D).

[50] Pan Hui,et al. Emotion Recognition through Gait on Mobile Devices , 2018, 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[51] Dinesh Manocha,et al. Identifying Emotions from Walking using Affective and Deep Features , 2019, ArXiv.

[52] Andrea Kleinsmith,et al. Affective Body Expression Perception and Recognition: A Survey , 2013, IEEE Transactions on Affective Computing.

[53] M. Kalaiselvi Geetha,et al. Automatic Human Emotion Recognition in Surveillance Video , 2017 .

[54] Gentiane Venture,et al. Recognizing Emotions Conveyed by Human Gait , 2014, International Journal of Social Robotics.

[55] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[56] Fadel Adib,et al. Emotion recognition using wireless signals , 2016, MobiCom.

[57] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[58] Tieniu Tan,et al. Automatic gait recognition based on statistical shape analysis , 2003, IEEE Trans. Image Process..

[59] Björn W. Schuller,et al. Semisupervised Autoencoders for Speech Emotion Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[60] P. Mythili,et al. Prosodic feature based speech emotion recognition at segmental and supra segmental levels , 2015, 2015 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES).

[61] Ingrid Fadelli. Identifying perceived emotions from people's walking style , 2020 .