LBVCNN: Local Binary Volume Convolutional Neural Network for Facial Expression Recognition From Image Sequences

Recognizing facial expressions is one of the central problems in computer vision. Temporal image sequences have useful spatio-temporal features for recognizing expressions. In this paper, we propose a new 3D Convolution Neural Network (CNN) that can be trained end-to-end for facial expression recognition on temporal image sequences without using facial landmarks. More specifically, a novel 3D convolutional layer that we call Local Binary Volume (LBV) layer is proposed. The LBV layer, when used with our newly proposed LBVCNN network, achieve comparable results compared to state-of-the-art landmark-based or without landmark-based models on image sequences from CK+, Oulu-CASIA, and UNBC McMaster shoulder pain datasets. Furthermore, our LBV layer reduces the number of trainable parameters by a significant amount when compared to a conventional 3D convolutional layer. As a matter of fact, when compared to a 3 × 3 × 3 conventional 3D convolutional layer, the LBV layer uses 27 times less trainable parameters.

[1]  Vishnu Naresh Boddeti,et al.  Local Binary Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Li-Chen Fu,et al.  Temporal-Contrastive Appearance Network for Facial Expression Recognition , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[3]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[4]  Takeo Kanade,et al.  Spatio-temporal Event Classification Using Time-Series Kernel Based Structured Sparsity , 2014, ECCV.

[5]  Marian Stewart Bartlett,et al.  Classification and weakly supervised pain localization using multiple segment representation , 2014, Image Vis. Comput..

[6]  Jiayu Dong,et al.  Dynamic Facial Expression Recognition Based on Convolutional Neural Networks with Dense Connections , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[7]  Tamás D. Gedeon,et al.  Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015 , 2015, ICMI.

[8]  Andrea Cavallaro,et al.  Learning Bases of Activity for Facial Expression Recognition , 2017, IEEE Transactions on Image Processing.

[9]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[10]  Mohammad H. Mahoor,et al.  Going deeper in facial expression recognition using deep neural networks , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  Marian Stewart Bartlett,et al.  Exploring Bag of Words Architectures in the Facial Expression Domain , 2012, ECCV Workshops.

[12]  Shiguang Shan,et al.  Learning Expressionlets on Spatio-temporal Manifold for Dynamic Facial Expression Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[14]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jeffrey F. Cohn,et al.  Painful data: The UNBC-McMaster shoulder pain expression archive database , 2011, Face and Gesture 2011.

[16]  Lijun Yin,et al.  Static and dynamic 3D facial expression recognition: A comprehensive survey , 2012, Image Vis. Comput..

[17]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[18]  Shuicheng Yan,et al.  Peak-Piloted Deep Network for Facial Expression Recognition , 2016, ECCV.

[19]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[20]  Jie Zhang,et al.  Combination of multiple neural networks using data fusion techniques for enhanced nonlinear process modelling , 2005, Comput. Chem. Eng..

[21]  Jake K. Aggarwal,et al.  Facial expression recognition with temporal modeling of shapes , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[22]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[23]  Qiang Ji,et al.  Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Deepak Ghimire,et al.  Geometric Feature-Based Facial Expression Recognition in Image Sequences Using Multi-Class AdaBoost and Support Vector Machines , 2013, Sensors.

[25]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Václav Hlavác,et al.  Real-time multi-view facial landmark detector learned by the structured output SVM , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[27]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[28]  Junmo Kim,et al.  Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Yong Du,et al.  Facial Expression Recognition Based on Deep Evolutional Spatial-Temporal Networks , 2017, IEEE Transactions on Image Processing.

[30]  Matti Pietikäinen,et al.  Towards a practical lipreading system , 2011, CVPR 2011.

[31]  Brian C. Lovell,et al.  Spatio-temporal covariance descriptors for action and gesture recognition , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[32]  Aliaa A. A. Youssif,et al.  Automatic Facial Expression Recognition System Based on Geometric and Appearance Features , 2011, Comput. Inf. Sci..

[33]  Gaurav Sharma,et al.  LOMo: Latent Ordinal Model for Facial Analysis in Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Matti Pietikäinen,et al.  Facial expression recognition from near-infrared videos , 2011, Image Vis. Comput..

[35]  Radu Timofte,et al.  Failure Detection for Facial Landmark Detectors , 2016, ACCV Workshops.

[36]  Junping Du,et al.  Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Zhiyuan Li,et al.  Island Loss for Learning Discriminative Features in Facial Expression Recognition , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[38]  Andreas E. Savakis,et al.  Manifold based Sparse Representation for robust expression recognition without neutral subtraction , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[39]  Ping Liu,et al.  Identity-Aware Convolutional Neural Network for Facial Expression Recognition , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[40]  Shiguang Shan,et al.  Deeply Learning Deformable Facial Action Parts Model for Dynamic Expression Analysis , 2014, ACCV.

[41]  Joost van de Weijer,et al.  Regularized Multi-Concept MIL for weakly-supervised facial behavior categorization , 2014, BMVC.