论文信息 - Robust and energy-efficient expression recognition based on improved deep ResNets

Robust and energy-efficient expression recognition based on improved deep ResNets

Abstract To improve the robustness and to reduce the energy consumption of facial expression recognition, this study proposed a facial expression recognition method based on improved deep residual networks (ResNets). Residual learning has solved the degradation problem of deep Convolutional Neural Networks (CNNs); therefore, in theory, a ResNet can consist of infinite number of neural layers. On the one hand, ResNets benefit from better performance on artificial intelligence (AI) tasks, thanks to its deeper network structure; meanwhile, on the other hand, it faces a severe problem of energy consumption, especially on mobile devices. Hence, this study employs a novel activation function, the Noisy Softplus (NSP), to replace rectified linear units (ReLU) to get improved ResNets. NSP is a biologically plausible activation function, which was first proposed in training Spiking Neural Networks (SNNs); thus, NSP-trained models can be directly implemented on ultra-low-power neuromorphic hardware. We built an 18-layered ResNet using NSP to perform facial expression recognition across datasets Cohn-Kanade (CK+), Karolinska Directed Emotional Faces (KDEF) and GENKI-4K. The results achieved better anti-noise ability than ResNet using the activation function ReLU and showed low energy consumption running on neuromorphic hardware. This study not only contributes a solution for robust facial expression recognition, but also consolidates the low energy cost of their implementation on neuromorphic devices, which could pave the way for high-performance, noise-robust and energy-efficient vision applications on mobile hardware.

Yanjun Zeng | Qian Liu | Jin Du | Yunhua Chen | Ling Zhang

[1] D. Lundqvist,et al. Karolinska Directed Emotional Faces , 2015 .

[2] Maja Pantic,et al. Fully Automatic Recognition of the Temporal Phases of Facial Actions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Stephen M. Trimberger. Field-Programmable Gate Array Technology , 2007 .

[5] Hatice Gunes,et al. How to distinguish posed from spontaneous smiles using geometric features , 2007, ICMI '07.

[6] Zhen Li,et al. Emotion recognition from an ensemble of features , 2011, Face and Gesture 2011.

[7] D. Mitchell Wilkes,et al. Enhanced rational signal modeling , 1991, Signal Process..

[8] Takehisa Yairi,et al. A comparison study of feature spaces and classification methods for facial expression recognition , 2013, 2013 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[9] Lijun Yin,et al. Multi-view facial expression recognition , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[10] Shiguang Shan,et al. Deeply Learning Deformable Facial Action Parts Model for Dynamic Expression Analysis , 2014, ACCV.

[11] Dipti Prasad Mukherjee,et al. Local dominant binary patterns for recognition of multi-view facial expressions , 2016, ICVGIP '16.

[12] Vinod Chandran,et al. Towards robust automatic affective classification of images using facial expressions for practical applications , 2015, Multimedia Tools and Applications.

[13] Yanjun Zeng,et al. Hybrid facial image feature extraction and recognition for non-invasive chronic fatigue syndrome diagnosis , 2015, Comput. Biol. Medicine.

[14] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15] Guang-Bin Huang,et al. Smile detection using Pair-wise Distance Vector and Extreme Learning Machine , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[16] Shaogang Gong,et al. Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[17] A. Hodgkin,et al. Action Potentials Recorded from Inside a Nerve Fibre , 1939, Nature.

[18] Eugene M. Izhikevich,et al. Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.

[19] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[20] Narayan Srinivasa,et al. Energy-Efficient Neuron, Synapse and STDP Integrated Circuits , 2012, IEEE Transactions on Biomedical Circuits and Systems.

[21] Bernabé Linares-Barranco,et al. Spike-Based Convolutional Network for Real-Time Processing , 2010, 2010 20th International Conference on Pattern Recognition.

[22] Tie-Yan Liu,et al. On the Depth of Deep Neural Networks: A Theoretical View , 2015, AAAI.

[23] Razvan Pascanu,et al. Combining modality specific deep neural networks for emotion recognition in video , 2013, ICMI '13.

[24] Wulfram Gerstner,et al. Neuronal Dynamics: From Single Neurons To Networks And Models Of Cognition , 2014 .

[25] Deepak Khosla,et al. Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition , 2014, International Journal of Computer Vision.

[26] Yau-Hwang Kuo,et al. Emotion recognition based on a novel triangular facial feature extraction method , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[27] Takeo Kanade,et al. The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Sander M. Bohte,et al. Computing with Spiking Neuron Networks , 2012, Handbook of Natural Computing.

[30] Javier R. Movellan,et al. A discriminative approach to frame-by-frame head pose tracking , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[31] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Steve B. Furber,et al. Noisy Softplus: A Biology Inspired Activation Function , 2016, ICONIP.

[33] Timothée Masquelier,et al. Unsupervised Learning of Visual Features through Spike Timing Dependent Plasticity , 2007, PLoS Comput. Biol..

[34] Soo-Young Lee,et al. Hierarchical Committee of Deep CNNs with Exponentially-Weighted Decision Fusion for Static Facial Expression Recognition , 2015, ICMI.

[35] Chris Eliasmith,et al. Spiking Deep Networks with LIF Neurons , 2015, ArXiv.

[36] Qingquan Li,et al. Multi-focus image fusion based on depth extraction with inhomogeneous diffusion equation , 2016, Signal Process..

[37] Mark Elshaw,et al. Stacked deep convolutional auto-encoders for emotion recognition from facial expressions , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[38] Qian Liu,et al. Deep spiking neural networks , 2018 .