Spatial–Temporal Recurrent Neural Network for Emotion Recognition

In this paper, we propose a novel deep learning framework, called spatial–temporal recurrent neural network (STRNN), to integrate the feature learning from both spatial and temporal information of signal sources into a unified spatial–temporal dependency model. In STRNN, to capture those spatially co-occurrent variations of human emotions, a multidirectional recurrent neural network (RNN) layer is employed to capture long-range contextual cues by traversing the spatial regions of each temporal slice along different directions. Then a bi-directional temporal RNN layer is further used to learn the discriminative features characterizing the temporal dependencies of the sequences, where sequences are produced from the spatial RNN layer. To further select those salient regions with more discriminative ability for emotion recognition, we impose sparse projection onto those hidden states of spatial and temporal domains to improve the model discriminant ability. Consequently, the proposed two-layer RNN model provides an effective way to make use of both spatial and temporal dependencies of the input signals for emotion recognition. Experimental results on the public emotion datasets of electroencephalogram and facial expression demonstrate the proposed STRNN method is more competitive over those state-of-the-art methods.

[1]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Sercan Ömer Arik,et al.  Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting , 2017, INTERSPEECH.

[3]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[5]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[6]  Leontios J. Hadjileontiadis,et al.  Emotion Recognition From EEG Using Higher Order Crossings , 2010, IEEE Transactions on Information Technology in Biomedicine.

[7]  Andrea Cavallaro,et al.  Learning Bases of Activity for Facial Expression Recognition , 2017, IEEE Transactions on Image Processing.

[8]  Wei Liu,et al.  Emotion Recognition Using Multimodal Deep Learning , 2016, ICONIP.

[9]  Edilson de Aguiar,et al.  Facial expression recognition with Convolutional Neural Networks: Coping with few data and the training sample order , 2017, Pattern Recognit..

[10]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[11]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[12]  Gang Wang,et al.  Convolutional recurrent neural networks: Learning spatial dependencies for image representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[14]  Jake K. Aggarwal,et al.  Facial expression recognition with temporal modeling of shapes , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[15]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[17]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[18]  Shiguang Shan,et al.  Deeply Learning Deformable Facial Action Parts Model for Dynamic Expression Analysis , 2014, ACCV.

[19]  Mohammad H. Mahoor,et al.  Going deeper in facial expression recognition using deep neural networks , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[20]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21]  Andreas E. Savakis,et al.  Manifold based Sparse Representation for robust expression recognition without neutral subtraction , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[22]  Zhen Cui,et al.  A Novel Graph Regularized Sparse Linear Discriminant Analysis Model for EEG Emotion Recognition , 2016, ICONIP.

[23]  Yoshua Bengio,et al.  ReSeg: A Recurrent Neural Network-Based Model for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Yoshua Bengio,et al.  ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks , 2015, ArXiv.

[26]  Björn W. Schuller,et al.  Convolutional RNN: An enhanced model for extracting features from sequential data , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[27]  Wenming Zheng,et al.  Multichannel EEG-Based Emotion Recognition via Group Sparse Canonical Correlation Analysis , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[28]  Tong Zhang,et al.  Video Based Emotion Recognition Using CNN and BRNN , 2016, CCPR.

[29]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[30]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[31]  Mark Sandler,et al.  Convolutional recurrent neural networks for music classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Junmo Kim,et al.  Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Bao-Liang Lu,et al.  Investigating Critical Frequency Bands and Channels for EEG-Based Emotion Recognition with Deep Neural Networks , 2015, IEEE Transactions on Autonomous Mental Development.

[34]  Shiguang Shan,et al.  Learning Expressionlets on Spatio-temporal Manifold for Dynamic Facial Expression Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Bao-Liang Lu,et al.  Differential entropy feature for EEG-based emotion classification , 2013, 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER).

[36]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[37]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[38]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[39]  Yoshua Bengio,et al.  ReSeg: A Recurrent Neural Network for Object Segmentation , 2015, ArXiv.