Robust and energy-efficient expression recognition based on improved deep ResNets

Abstract To improve the robustness and to reduce the energy consumption of facial expression recognition, this study proposed a facial expression recognition method based on improved deep residual networks (ResNets). Residual learning has solved the degradation problem of deep Convolutional Neural Networks (CNNs); therefore, in theory, a ResNet can consist of infinite number of neural layers. On the one hand, ResNets benefit from better performance on artificial intelligence (AI) tasks, thanks to its deeper network structure; meanwhile, on the other hand, it faces a severe problem of energy consumption, especially on mobile devices. Hence, this study employs a novel activation function, the Noisy Softplus (NSP), to replace rectified linear units (ReLU) to get improved ResNets. NSP is a biologically plausible activation function, which was first proposed in training Spiking Neural Networks (SNNs); thus, NSP-trained models can be directly implemented on ultra-low-power neuromorphic hardware. We built an 18-layered ResNet using NSP to perform facial expression recognition across datasets Cohn-Kanade (CK+), Karolinska Directed Emotional Faces (KDEF) and GENKI-4K. The results achieved better anti-noise ability than ResNet using the activation function ReLU and showed low energy consumption running on neuromorphic hardware. This study not only contributes a solution for robust facial expression recognition, but also consolidates the low energy cost of their implementation on neuromorphic devices, which could pave the way for high-performance, noise-robust and energy-efficient vision applications on mobile hardware.

[1]  D. Lundqvist,et al.  Karolinska Directed Emotional Faces , 2015 .

[2]  Maja Pantic,et al.  Fully Automatic Recognition of the Temporal Phases of Facial Actions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Stephen M. Trimberger Field-Programmable Gate Array Technology , 2007 .

[5]  Hatice Gunes,et al.  How to distinguish posed from spontaneous smiles using geometric features , 2007, ICMI '07.

[6]  Zhen Li,et al.  Emotion recognition from an ensemble of features , 2011, Face and Gesture 2011.

[7]  D. Mitchell Wilkes,et al.  Enhanced rational signal modeling , 1991, Signal Process..

[8]  Takehisa Yairi,et al.  A comparison study of feature spaces and classification methods for facial expression recognition , 2013, 2013 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[9]  Lijun Yin,et al.  Multi-view facial expression recognition , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[10]  Shiguang Shan,et al.  Deeply Learning Deformable Facial Action Parts Model for Dynamic Expression Analysis , 2014, ACCV.

[11]  Dipti Prasad Mukherjee,et al.  Local dominant binary patterns for recognition of multi-view facial expressions , 2016, ICVGIP '16.

[12]  Vinod Chandran,et al.  Towards robust automatic affective classification of images using facial expressions for practical applications , 2015, Multimedia Tools and Applications.

[13]  Yanjun Zeng,et al.  Hybrid facial image feature extraction and recognition for non-invasive chronic fatigue syndrome diagnosis , 2015, Comput. Biol. Medicine.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Guang-Bin Huang,et al.  Smile detection using Pair-wise Distance Vector and Extreme Learning Machine , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[16]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[17]  A. Hodgkin,et al.  Action Potentials Recorded from Inside a Nerve Fibre , 1939, Nature.

[18]  Eugene M. Izhikevich,et al.  Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.

[19]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[20]  Narayan Srinivasa,et al.  Energy-Efficient Neuron, Synapse and STDP Integrated Circuits , 2012, IEEE Transactions on Biomedical Circuits and Systems.

[21]  Bernabé Linares-Barranco,et al.  Spike-Based Convolutional Network for Real-Time Processing , 2010, 2010 20th International Conference on Pattern Recognition.

[22]  Tie-Yan Liu,et al.  On the Depth of Deep Neural Networks: A Theoretical View , 2015, AAAI.

[23]  Razvan Pascanu,et al.  Combining modality specific deep neural networks for emotion recognition in video , 2013, ICMI '13.

[24]  Wulfram Gerstner,et al.  Neuronal Dynamics: From Single Neurons To Networks And Models Of Cognition , 2014 .

[25]  Deepak Khosla,et al.  Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition , 2014, International Journal of Computer Vision.

[26]  Yau-Hwang Kuo,et al.  Emotion recognition based on a novel triangular facial feature extraction method , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[27]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Sander M. Bohte,et al.  Computing with Spiking Neuron Networks , 2012, Handbook of Natural Computing.

[30]  Javier R. Movellan,et al.  A discriminative approach to frame-by-frame head pose tracking , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[31]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Steve B. Furber,et al.  Noisy Softplus: A Biology Inspired Activation Function , 2016, ICONIP.

[33]  Timothée Masquelier,et al.  Unsupervised Learning of Visual Features through Spike Timing Dependent Plasticity , 2007, PLoS Comput. Biol..

[34]  Soo-Young Lee,et al.  Hierarchical Committee of Deep CNNs with Exponentially-Weighted Decision Fusion for Static Facial Expression Recognition , 2015, ICMI.

[35]  Chris Eliasmith,et al.  Spiking Deep Networks with LIF Neurons , 2015, ArXiv.

[36]  Qingquan Li,et al.  Multi-focus image fusion based on depth extraction with inhomogeneous diffusion equation , 2016, Signal Process..

[37]  Mark Elshaw,et al.  Stacked deep convolutional auto-encoders for emotion recognition from facial expressions , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[38]  Qian Liu,et al.  Deep spiking neural networks , 2018 .