A Joint Convolutional Bidirectional LSTM Framework for Facial Expression Recognition

SUMMARY Facial expressions are generated by the actions of the fa- cial muscles located at di ff erent facial regions. The spatial dependencies of di ff erent spatial facial regions are worth exploring and can improve the performance of facial expression recognition. In this letter we propose a joint convolutional bidirectional long short-term memory (JCBLSTM) framework to model the discriminative facial textures and spatial relations between di ff erent regions jointly. We treat each row or column of fea- ture maps output from CNN as individual ordered sequence and employ LSTM to model the spatial dependencies within it. Moreover, a shortcut connection for convolutional feature maps is introduced for joint feature representation. We conduct experiments on two databases to evaluate the proposed JCBLSTM method. The experimental results demonstrate that the JCBLSTM method achieves state-of-the-art performance on Multi-PIE and very competitive result on FER-2013.

[1]  Tong Zhang,et al.  A Deep Neural Network-Driven Feature Learning Method for Multi-view Facial Expression Recognition , 2016, IEEE Transactions on Multimedia.

[2]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[3]  Siwei Luo,et al.  A Local Characteristic Image Restoration Based on Convolutional Neural Network , 2016, IEICE Trans. Inf. Syst..

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Xiaolin Hu,et al.  Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Richard Bowden,et al.  Local binary patterns for multi-view facial expression recognition , 2011 .

[8]  Gang Wang,et al.  Convolutional recurrent neural networks: Learning spatial dependencies for image representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Wenming Zheng,et al.  Facial Expression Recognition Based on Sparse Locality Preserving Projection , 2014, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[10]  Li Li,et al.  An Improved Supervised Speech Separation Method Based on Perceptual Weighted Deep Recurrent Neural Networks , 2017, IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences.

[11]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[12]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[13]  Jun Wan,et al.  Facial Expression Recognition Based on Multi-scale CNNs , 2016, CCBR.

[14]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[15]  Wenming Zheng,et al.  Multi-View Facial Expression Recognition Based on Group Sparse Reduced-Rank Regression , 2014, IEEE Transactions on Affective Computing.

[16]  Hongbin Zha,et al.  Locality-constrained linear coding based bi-layer model for multi-view facial expression recognition , 2017, Neurocomputing.