A Deep Framework for Facial Emotion Recognition using Light Field Images

Light field cameras capture the intensity of light rays coming from multiple directions, thus allowing a set of 2D images, named sub-aperture (SA) images, to be rendered. These images correspond to observations of the scene from slightly different angles. The rich spatio-angular information obtained using these cameras is exploited in this paper, for the first time, in the context of facial emotion recognition. A deep learning spatio-angular fusion framework is adopted which is able to model both the intra-view/spatial and inter-view/angular information, using a VGG-16 convolutional neural network and a long short-term memory (LSTM) recurrent network. The proposed solution, based on the adopted deep spatio-angular fusion framework, creates two view sequences, horizontal and vertical, with selected SA images, for which VGG-Face descriptions are extracted. The resulting descriptions are fed to two LSTM networks, with the aim of independently learning horizontal and vertical classification models. The softmax classifier scores obtained for the horizontal and vertical descriptors are then fused to obtain the final emotion recognition labels. A comprehensive set of experiments has been conducted on the IST-EURECOM light field face database using two assessment protocols. The adopted framework achieves superior emotion recognition performance when compared with state-of-the-art benchmarking methods.

[1]  Fernando Pereira,et al.  Efficient plenoptic imaging representation: Why do we need it? , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[2]  Paulo Lobato Correia,et al.  The IST-EURECOM Light Field Face Database , 2017, 2017 5th International Workshop on Biometrics and Forensics (IWBF).

[3]  Chabane Djeraba,et al.  DLBP: A novel descriptor for depth image based face recognition , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[4]  Kiran B. Raja,et al.  Exploring the Usefulness of Light Field Cameras for Biometrics: An Empirical Study on Face and Iris Recognition , 2016, IEEE Transactions on Information Forensics and Security.

[5]  Sergio Escalera,et al.  Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Minghao Wang,et al.  Multi-Feature Based Emotion Recognition for Video Clips , 2018, ICMI.

[7]  Stefan Winkler,et al.  Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning , 2015, ICMI.

[8]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[9]  Stefanos Zafeiriou,et al.  A Multi-component CNN-RNN Approach for Dimensional Emotion Recognition in-the-wild , 2018, ArXiv.

[10]  Naresh Kumar Garg,et al.  Facial Emotion Recognition System Based on PCA and Gradient Features , 2018, National Academy Science Letters.

[11]  Gérard G. Medioni,et al.  Pose-Aware Face Recognition in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Cheng Lu,et al.  Multiple Spatio-temporal Feature Learning for Video-based Emotion Recognition in the Wild , 2018, ICMI.

[13]  Paulo Lobato Correia,et al.  LIGHT FIELD BASED FACE RECOGNITION VIA A FUSED DEEP REPRESENTATION , 2018, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).

[14]  Frédéric Jurie,et al.  Temporal multimodal fusion for video emotion classification in the wild , 2017, ICMI.

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Paulo Lobato Correia,et al.  Light field local binary patterns description for face recognition , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[17]  P. Hanrahan,et al.  Light Field Photography with a Hand-held Plenoptic Camera , 2005 .

[18]  Paulo Lobato Correia,et al.  Face spoofing detection using a light field imaging framework , 2018, IET Biom..

[19]  Hatice Gunes,et al.  CNN-based Facial Affect Analysis on Mobile Devices , 2018, ArXiv.

[20]  M. R. Narasinga Rao,et al.  Deformable Facial Fitting Using Active Appearance Model for Emotion Recognition , 2019 .

[21]  Paulo Lobato Correia,et al.  Ear Presentation Attack Detection: Benchmarking Study with First Lenslet Light Field Database , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[22]  Yongzhao Zhan,et al.  Facial expression features extraction based on Gabor wavelet transformation , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[23]  Boyang Li,et al.  Video Emotion Recognition with Transferred Deep Feature Encodings , 2016, ICMR.

[24]  Ioannis Pitas,et al.  Comparison of ICA approaches for facial expression recognition , 2009, Signal Image Video Process..

[25]  Paulo Lobato Correia,et al.  A Double-Deep Spatio-Angular Learning Framework for Light Field-Based Face Recognition , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  Shan Li,et al.  Deep Facial Expression Recognition: A Survey , 2018, IEEE Transactions on Affective Computing.

[28]  Di Huang,et al.  Local Binary Patterns and Its Application to Facial Image Analysis: A Survey , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[29]  Kiran B. Raja,et al.  Presentation Attack Detection for Face Recognition Using Light Field Camera , 2015, IEEE Transactions on Image Processing.

[30]  Min Hu,et al.  Video facial emotion recognition based on local enhanced motion history image and CNN-CTSLSTM networks , 2019, J. Vis. Commun. Image Represent..

[31]  Jean Meunier,et al.  Emotion recognition using dynamic grid-based HoG features , 2011, Face and Gesture 2011.

[32]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[33]  Dong-Yan Huang,et al.  Audio-visual emotion recognition using deep transfer learning and multiple temporal models , 2017, ICMI.

[34]  Andrea Cavallaro,et al.  Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  S. Sumathi,et al.  Automatic recognition and analysis of human faces and facial expression by LDA using wavelet transform , 2012, 2012 International Conference on Computer Communication and Informatics.

[36]  Haiping Lu,et al.  MPCA: Multilinear Principal Component Analysis of Tensor Objects , 2008, IEEE Transactions on Neural Networks.

[37]  Tamás D. Gedeon,et al.  Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015 , 2015, ICMI.

[38]  Paulo Lobato Correia,et al.  Light Field-Based Face Presentation Attack Detection: Reviewing, Benchmarking and One Step Further , 2018, IEEE Transactions on Information Forensics and Security.

[39]  Thomas S. Huang,et al.  Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition? , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[40]  Paulo Lobato Correia,et al.  Ear recognition in a light field imaging framework: a new perspective , 2018, IET Biom..

[41]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[42]  Mohammad H. Mahoor,et al.  Going deeper in facial expression recognition using deep neural networks , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[43]  Tal Hassner,et al.  Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns , 2015, ICMI.

[44]  Ioannis Pitas,et al.  Discriminant Graph Structures for Facial Expression Recognition , 2008, IEEE Transactions on Multimedia.

[45]  Tamás D. Gedeon,et al.  Emotion recognition using PHOG and LPQ features , 2011, Face and Gesture 2011.