The Labeled Multiple Canonical Correlation Analysis for Information Fusion

The objective of multimodal information fusion is to mathematically analyze information carried in different sources and create a new representation that will be more effectively utilized in pattern recognition and other multimedia information processing tasks. In this paper, we introduce a new method for multimodal information fusion and representation based on the Labeled Multiple Canonical Correlation Analysis (LMCCA). By incorporating class label information of the training samples, the proposed LMCCA ensures that the fused features carry discriminative characteristics of the multimodal information representations and are capable of providing superior recognition performance. We implement a prototype of LMCCA to demonstrate its effectiveness on handwritten digit recognition, face recognition, and object recognition utilizing multiple features, bimodal human emotion recognition involving information from both audio and visual domains. The generic nature of LMCCA allows it to take as input features extracted by any means, including those by deep learning (DL) methods. Experimental results show that the proposed method enhanced the performance of both statistical machine learning methods, and methods based on DL.

[1]  Peng Tang,et al.  Learning Multi-Instance Deep Discriminative Patterns for Image Classification , 2017, IEEE Transactions on Image Processing.

[2]  Chung-Hsien Wu,et al.  Two-Level Hierarchical Alignment for Semi-Coupled HMM-Based Audiovisual Emotion Recognition With Temporal Course , 2013, IEEE Transactions on Multimedia.

[3]  Arthur Tenenhaus,et al.  Regularized generalized canonical correlation analysis for multiblock or multigroup data analysis , 2013, Eur. J. Oper. Res..

[4]  Chung-Hsien Wu,et al.  Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels , 2015, IEEE Transactions on Affective Computing.

[5]  Yoshihiko Hamamoto,et al.  A gabor filter-based method for recognizing handwritten numerals , 1998, Pattern Recognit..

[6]  Lei Zhang,et al.  Gabor Feature Based Sparse Representation for Face Recognition with Gabor Occlusion Dictionary , 2010, ECCV.

[7]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[8]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Ling Guan,et al.  Discriminative Multiple Canonical Correlation Analysis for Information Fusion. , 2018, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[10]  Loris Nanni,et al.  Combining biometric matchers by means of machine learning and statistical approaches , 2015, Neurocomputing.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Ivor W. Tsang,et al.  Error Correcting Input and Output Hashing , 2019, IEEE Transactions on Cybernetics.

[13]  Ling Guan,et al.  Human emotional state recognition using real 3D visual features from Gabor library , 2013, Pattern Recognit..

[14]  Xu Zhang,et al.  Feature-level fusion of fingerprint and finger-vein for personal identification , 2012, Pattern Recognit. Lett..

[15]  Liming Chen,et al.  HSOG: A Novel Local Image Descriptor Based on Histograms of the Second-Order Gradients , 2014, IEEE Transactions on Image Processing.

[16]  Hongxun Yao,et al.  Hierarchical semantic image matching using CNN feature pyramid , 2018, Comput. Vis. Image Underst..

[17]  Mubarak Shah,et al.  Multimodal Analysis for Identification and Segmentation of Moving-Sounding Objects , 2013, IEEE Transactions on Multimedia.

[18]  Lei Zhang,et al.  Sparse representation or collaborative representation: Which helps face recognition? , 2011, 2011 International Conference on Computer Vision.

[19]  Wei Xiong,et al.  Combining local and global: Rich and robust feature pooling for visual recognition , 2017, Pattern Recognit..

[20]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Luc Van Gool,et al.  Ensemble Projection for Semi-supervised Image Classification , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[23]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Leonid Mestetskiy,et al.  Face recognition using kernel entropy component analysis , 2011, Neurocomputing.

[25]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[26]  Xu Chen,et al.  Multimodal Video Indexing and Retrieval Using Directed Information , 2012, IEEE Transactions on Multimedia.

[27]  Jun Gao,et al.  Multiset Canonical Correlation Analysis Using for Blind Source Separation , 2012 .

[28]  Hong Ren Wu,et al.  Facial Expression Recognition in Perceptual Color Space , 2012, IEEE Transactions on Image Processing.

[29]  Alireza Khotanzad,et al.  Invariant Image Recognition by Zernike Moments , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Jian Yang,et al.  Feature fusion: parallel strategy vs. serial strategy , 2003, Pattern Recognit..

[31]  Xudong Jiang,et al.  LBP-Based Edge-Texture Features for Object Recognition , 2014, IEEE Transactions on Image Processing.

[32]  Changsheng Xu,et al.  Cross-Domain Feature Learning in Multimedia , 2015, IEEE Transactions on Multimedia.

[33]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Ling Guan,et al.  Recognizing Human Emotional State From Audiovisual Signals , 2008, IEEE Transactions on Multimedia.

[35]  Li Shang,et al.  Deception detecting from speech signal using relevance vector machine and non-linear dynamics features , 2015, Neurocomputing.

[36]  Yun Fu,et al.  Self-Taught Low-Rank Coding for Visual Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Carey E. Priebe,et al.  Generalized canonical correlation analysis for classification , 2013, J. Multivar. Anal..

[38]  Petros Maragos,et al.  Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention , 2013, IEEE Transactions on Multimedia.

[39]  Zi Huang,et al.  Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis , 2013, IEEE Transactions on Multimedia.

[40]  Qijun Zhao,et al.  Facial expression recognition on multiple manifolds , 2011, Pattern Recognit..

[41]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[42]  Stephen Lin,et al.  FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Michael J. Black,et al.  Recognizing Facial Expressions in Image Sequences Using Local Parameterized Models of Image Motion , 1997, International Journal of Computer Vision.

[44]  Yu-Chiang Frank Wang,et al.  A Novel Multiple Kernel Learning Framework for Heterogeneous Feature Fusion and Variable Selection , 2012, IEEE Transactions on Multimedia.

[45]  Jieping Ye,et al.  Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Mas Rina Mustaffa,et al.  An effective fusion model for image retrieval , 2018, Multimedia Tools and Applications.

[47]  Kah Phooi Seng,et al.  A new approach of audio emotion recognition , 2014, Expert Syst. Appl..

[48]  Jian Yang,et al.  A New Discriminative Sparse Representation Method for Robust Face Recognition via $l_{2}$ Regularization , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[49]  Vince D. Calhoun,et al.  Joint Blind Source Separation by Multiset Canonical Correlation Analysis , 2009, IEEE Transactions on Signal Processing.

[50]  Horst Bischof,et al.  Appearance models based on kernel canonical correlation analysis , 2003, Pattern Recognit..

[51]  Vince D. Calhoun,et al.  Group Study of Simulated Driving fMRI Data by Multiset Canonical Correlation Analysis , 2009, NeuroImage.

[52]  Krishan Kumar,et al.  F-DES: Fast and Deep Event Summarization , 2017, IEEE Transactions on Multimedia.

[53]  Guangfeng Lin,et al.  Visual feature coding based on heterogeneous structure fusion for image classification , 2017, Inf. Fusion.

[54]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[55]  Ivor W. Tsang,et al.  DEFEATnet—A Deep Conventional Image Representation for Image Classification , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[56]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[57]  Fakhri Karray,et al.  Multisensor data fusion: A review of the state-of-the-art , 2013, Inf. Fusion.

[58]  Wei Xiong,et al.  Stacked Convolutional Denoising Auto-Encoders for Feature Representation , 2017, IEEE Transactions on Cybernetics.

[59]  Ignacio Santamaría,et al.  Deterministic CCA-Based Algorithms for Blind Equalization of FIR-MIMO Channels , 2007, IEEE Transactions on Signal Processing.

[60]  Allen Y. Yang,et al.  Fast L1-Minimization Algorithms For Robust Face Recognition , 2010, 1007.3753.

[61]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[62]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[63]  Dacheng Tao,et al.  Robust Face Recognition via Multimodal Deep Face Representation , 2015, IEEE Transactions on Multimedia.

[64]  Wen Gao,et al.  Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[65]  Pheng-Ann Heng,et al.  A theorem on the generalized canonical projective vectors , 2005, Pattern Recognit..