Multi-view laplacian least squares for human emotion recognition

Abstract Human emotion recognition is an emerging and important area in the field of human–computer interaction and artificial intelligence, which has been more and more related with multi-view learning methods. Subspace learning is an important direction of multi-view learning. However, most existing subspace learning methods could not make full use of both category discriminant information and local neighborhood information. As a typical subspace learning method, partial least squares (PLS) performs better and more robustly than many other subspace learning methods, because PLS is optimized with iteration method. However, PLS suffers from linear relationship assumption and two-view limitation. In this paper, a new nonlinear multi-view laplacian least squares (MvLLS) is proposed. MvLLS constructs a global laplacian weighted graph (GLWP) to introduce category discriminant information as well as protects the local neighborhood information. Optimized with iteration method, MvLLS is a multi-view extension of PLS. The proposed method has great extendibility and robustness. To meet the requirements of large-scale applications, weighted local preserving embedding (WLPE) is proposed as the out-of-sample extension of MvLLS, basing on the idea of maintaining the manifold structures of original space. Finally, the proposed method is verified on three multi-view emotion recognition tasks, the experiment results validate the effectiveness and robustness of MvLLS.

[1]  Alexandros Iosifidis,et al.  Generalized Multi-View Embedding for Visual Recognition and Cross-Modal Retrieval , 2016, IEEE Transactions on Cybernetics.

[2]  David Zhang,et al.  A Probabilistic Hierarchical Model for Multi-View and Multi-Feature Classification , 2018, AAAI.

[3]  Skye McDonald,et al.  Facial Emotion Recognition Deficits following Moderate–Severe Traumatic Brain Injury (TBI): Re-examining the Valence Effect and the Role of Emotion Intensity , 2014, Journal of the International Neuropsychological Society.

[4]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Christopher Joseph Pal,et al.  Recurrent Neural Networks for Emotion Recognition in Video , 2015, ICMI.

[6]  Peijie Yin,et al.  A Novel Biologically Inspired Visual Cognition Model: Automatic Extraction of Semantics, Formation of Integrated Concepts, and Reselection Features for Ambiguity , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[7]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8]  Syed Muhammad Anwar,et al.  Human emotion recognition and analysis in response to audio music using brain signals , 2016, Comput. Hum. Behav..

[9]  Shiliang Sun,et al.  Nonparametric Sparse Matrix Decomposition for Cross-View Dimensionality Reduction , 2017, IEEE Transactions on Multimedia.

[10]  Yongdong Zhang,et al.  Multiview Spectral Embedding , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[12]  Weifeng Liu,et al.  Multiview dimension reduction via Hessian multiset canonical correlations , 2018, Inf. Fusion.

[13]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[14]  John Shawe-Taylor,et al.  Synthesis of maximum margin and multiview learning using unlabeled data , 2007, ESANN.

[15]  Hongbin Zha,et al.  Locality-constrained linear coding based bi-layer model for multi-view facial expression recognition , 2017, Neurocomputing.

[16]  Eduardo Coutinho,et al.  Enhanced semi-supervised learning for multimodal emotion recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Wei Wu,et al.  Brain-Inspired Motion Learning in Recurrent Neural Network With Emotion Modulation , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[18]  Tong Zhang,et al.  A Deep Neural Network-Driven Feature Learning Method for Multi-view Facial Expression Recognition , 2016, IEEE Transactions on Multimedia.

[19]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[20]  Shiguang Shan,et al.  Multi-View Discriminant Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Razvan Pascanu,et al.  Combining modality specific deep neural networks for emotion recognition in video , 2013, ICMI '13.

[22]  Meng Wang,et al.  Image-Based Three-Dimensional Human Pose Recovery by Multiview Locality-Sensitive Sparse Retrieval , 2015, IEEE Transactions on Industrial Electronics.

[23]  Qinghua Hu,et al.  Generalized Latent Multi-View Subspace Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  I. Jolliffe Principal Component Analysis , 2002 .

[25]  Yang Wang,et al.  Locality constrained Graph Optimization for Dimensionality Reduction , 2017, Neurocomputing.

[26]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[28]  Quansen Sun,et al.  Graph regularized multiset canonical correlations with applications to joint feature extraction , 2014, Pattern Recognit..

[29]  Daoqiang Zhang,et al.  A New Locality-Preserving Canonical Correlation Analysis Algorithm for Multi-View Dimensionality Reduction , 2013, Neural Processing Letters.

[30]  Shiliang Sun,et al.  Multiview Uncorrelated Discriminant Analysis , 2016, IEEE Transactions on Cybernetics.

[31]  Jun Yu,et al.  Multitask Autoencoder Model for Recovering Human Poses , 2018, IEEE Transactions on Industrial Electronics.

[32]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[33]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[34]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Jing-Yu Yang,et al.  Face recognition based on the uncorrelated discriminant transformation , 2001, Pattern Recognit..

[36]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[38]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[39]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[40]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[41]  Xiao-Yuan Jing,et al.  Multi-view local discrimination and canonical correlation analysis for image classification , 2018, Neurocomputing.

[42]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[43]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Quan-Sen Sun,et al.  Laplacian multiset canonical correlations for multiview feature extraction and image recognition , 2015, Multimedia Tools and Applications.

[45]  Shihong Lao,et al.  Discriminant analysis in correlation similarity measure space , 2007, ICML '07.