Covariance Pooling for Facial Expression Recognition

Classifying facial expressions into different categories requires capturing regional distortions of facial landmarks. We believe that second-order statistics such as covariance is better able to capture such distortions in regional facial features. In this work, we explore the benefits of using a manifold network structure for covariance pooling to improve facial expression recognition. In particular, we first employ such kind of manifold networks in conjunction with traditional convolutional networks for spatial pooling within individual image feature maps in an end-to-end deep learning manner. By doing so, we are able to achieve a recognition accuracy of 58.14% on the validation set of Static Facial Expressions in the Wild (SFEW2.0) and 87.0% on the validation set of Real-World Affective Faces (RAF) Database1. Both of these results are the best results we are aware of. Besides, we leverage covariance pooling to capture the temporal evolution of per-frame features for video-based facial expression recognition. Our reported results demonstrate the advantage of pooling image-set features temporally by stacking the designed manifold network of covariance pooling on top of convolutional network layers.

[1]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[2]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[3]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[4]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[5]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[6]  Shiguang Shan,et al.  Combining Multiple Kernel Methods on Riemannian Manifold for Emotion Recognition in the Wild , 2014, ICMI.

[7]  Tamás D. Gedeon,et al.  Emotion Recognition In The Wild Challenge 2014: Baseline, Data and Protocol , 2014, ICMI.

[8]  Yurong Chen,et al.  Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild , 2015, ICMI.

[9]  Cristian Sminchisescu,et al.  Matrix Backpropagation for Deep Networks with Structured Layers , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Christopher Joseph Pal,et al.  Recurrent Neural Networks for Emotion Recognition in Video , 2015, ICMI.

[11]  Shiguang Shan,et al.  Log-Euclidean Metric Learning on Symmetric Positive Definite Manifold with Application to Image Set Classification , 2015, ICML.

[12]  Cha Zhang,et al.  Image based Static Facial Expression Recognition with Multiple Deep Network Learning , 2015, ICMI.

[13]  Soo-Young Lee,et al.  Hierarchical Committee of Deep CNNs with Exponentially-Weighted Decision Fusion for Static Facial Expression Recognition , 2015, ICMI.

[14]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[15]  Tong Zhang,et al.  Multi-clue fusion for emotion recognition in the wild , 2016, ICMI.

[16]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[17]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[18]  Aleix M. Martínez,et al.  EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andrea Vedaldi,et al.  Learning Grimaces by Watching TV , 2016, BMVC.

[20]  Emad Barsoum,et al.  Emotion recognition in the wild from videos using images , 2016, ICMI.

[21]  Rama Chellappa,et al.  FaceNet2ExpNet: Regularizing a Deep Face Recognition Net for Expression Recognition , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[22]  A. Yuille,et al.  Regularizing Face Net For Discrete-Valued Pain Regression , 2017 .

[23]  Gregory D. Hager,et al.  Regularizing face verification nets for pain intensity regression , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[24]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[25]  Luc Van Gool,et al.  A Riemannian Network for SPD Matrix Learning , 2016, AAAI.

[26]  Junping Du,et al.  Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Mathieu Salzmann,et al.  Second-order Convolutional Neural Networks , 2017, ArXiv.