Kernel Cross-Modal Factor Analysis for Information Fusion With Application to Bimodal Emotion Recognition

In this paper, we investigate kernel based methods for multimodal information analysis and fusion. We introduce a novel approach, kernel cross-modal factor analysis, which identifies the optimal transformations that are capable of representing the coupled patterns between two different subsets of features by minimizing the Frobenius norm in the transformed domain. The kernel trick is utilized for modeling the nonlinear relationship between two multidimensional variables. We examine and compare with kernel canonical correlation analysis which finds projection directions that maximize the correlation between two modalities, and kernel matrix fusion which integrates the kernel matrices of respective modalities through algebraic operations. The performance of the introduced method is evaluated on an audiovisual based bimodal emotion recognition problem. We first perform feature extraction from the audio and visual channels respectively. The presented approaches are then utilized to analyze the cross-modal relationship between audio and visual features. A hidden Markov model is subsequently applied for characterizing the statistical dependence across successive time segments, and identifying the inherent temporal structure of the features in the transformed domain. The effectiveness of the proposed solution is demonstrated through extensive experimentation.

[1]  Zhichun Mu,et al.  Feature Fusion Method Based on KCCA for Ear and Profile Face Based Multimodal Recognition , 2007, 2007 IEEE International Conference on Automation and Logistics.

[2]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[3]  Somnath Dey,et al.  Multimodal biometrics: state of the art in fusion techniques , 2009, Int. J. Biom..

[4]  Chun Chen,et al.  Audio-visual based emotion recognition using tripled hidden Markov model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Ling Guan,et al.  Multimodal information fusion for selected multimedia applications , 2010, Int. J. Multim. Intell. Secur..

[6]  Dae-Jong Lee,et al.  Emotion recognition from the facial image and speech signal , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[7]  Cristina Conde,et al.  Face verification with a kernel fusion method , 2010, Pattern Recognit. Lett..

[8]  Pengfei Zhang,et al.  Fusion of Global and Local F , 2004 .

[9]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[10]  Colin Fyfe,et al.  Kernel and Nonlinear Canonical Correlation Analysis , 2000, IJCNN.

[11]  Vince D. Calhoun,et al.  Canonical Correlation Analysis for Data Fusion and Group Inferences , 2010, IEEE Signal Processing Magazine.

[12]  Zhao Yue,et al.  Research on Kernel-Based Feature Fusion Algorithm in Multimodal Recognition , 2009, 2009 International Conference on Information Technology and Computer Science.

[13]  Arun Ross,et al.  Information fusion in biometrics , 2003, Pattern Recognit. Lett..

[14]  Georgios Tziritas,et al.  Face Detection Using Quantized Skin Color Regions Merging and Wavelet Packet Analysis , 1999, IEEE Trans. Multim..

[15]  Gérard Chollet,et al.  Audio-Visual Speech Synchrony Measure for Talking-Face Identity Verification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  Ishwar K. Sethi,et al.  Multimedia content processing through cross-modal association , 2003, MULTIMEDIA '03.

[17]  Chi-Ho Chan,et al.  Kernel Fusion of Multiple Histogram Descriptors for Robust Face Recognition , 2010, SSPR/SPR.

[18]  Yangyu Fan,et al.  Fusion of Global and Local Feature Using KCCA for Automatic Target Recognition , 2009, 2009 Fifth International Conference on Image and Graphics.

[19]  Angeliki Metallinou,et al.  Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[20]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[21]  Ling Guan,et al.  Recognizing Human Emotional State From Audiovisual Signals , 2008, IEEE Transactions on Multimedia.

[22]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[23]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Christoph H. Lampert,et al.  Correlational spectral clustering , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Tijl De Bie,et al.  Kernel-based data fusion for gene prioritization , 2007, ISMB/ECCB.

[26]  Thomas S. Huang,et al.  Audio-visual affective expression recognition , 2007, International Symposium on Multispectral Image Processing and Pattern Recognition.

[27]  Ishwar K. Sethi,et al.  Cross-Modal Analysis of Audio-Visual Programs for Speaker Detection , 2005, 2005 IEEE 7th Workshop on Multimedia Signal Processing.

[28]  Kristian Kroschel,et al.  Audio-visual emotion recognition using an emotion space concept , 2008, 2008 16th European Signal Processing Conference.

[29]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[30]  A. Murat Tekalp,et al.  Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis , 2007, IEEE Transactions on Multimedia.

[31]  Isabelle Bloch,et al.  Kernel Fusion for Image Classification Using Fuzzy Structural Information , 2007, ISVC.

[32]  Yan Liu,et al.  A new method of feature fusion and its application in image recognition , 2005, Pattern Recognit..

[33]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[34]  Xiaoyang Tan,et al.  Fusing Gabor and LBP Feature Sets for Kernel-Based Face Recognition , 2007, AMFG.

[35]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[36]  Mohammad Sadeghi,et al.  Composite Kernels for Fusing Colour Information in Face Verification Systems , 2009, 2009 International Conference on Digital Image Processing.

[37]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[38]  Kai-Tai Song,et al.  A New Information Fusion Method for Bimodal Robotic Emotion Recognition , 2008, J. Comput..

[39]  Dongmei Sun,et al.  Feature fusion of palmprint and face based on KFDA , 2008, 2008 9th International Conference on Signal Processing.

[40]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).