Isomorphic and sparse multimodal data representation based on correlation analysis

Multimodal data is more and more popular in recent years. It is most interesting and challenging to learn multimodal data representation which affects the performance of relevant applications greatly, such as retrieval and clustering. However, it is difficult to find an efficient representation for multimedia data of different modalities which are heterogeneous in low-level features. Also it is hard to bridge the semantic gap between features and semantics. In this paper, we propose an isomorphic and sparse multimodal data representation method. First, we learn an isomorphic content representation by analyzing kernel canonical correlation among heterogeneous features; secondly, we propose optimization strategy of graph-based semantic sparse boosting. Extensive experiments demonstrate the superiority of our method over several existing algorithms.

[1]  David R. Hardoon,et al.  KCCA for different level precision in content-based image retrieval , 2003 .

[2]  Li Chen,et al.  Learning optimal data representation for cross-media retrieval , 2012, 2012 19th IEEE International Conference on Image Processing.

[3]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[4]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[5]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[6]  Meng Wang,et al.  Semi-supervised distance metric learning based on local linear regression for data clustering , 2012, Neurocomputing.

[7]  Horst Bischof,et al.  Appearance models based on kernel canonical correlation analysis , 2003, Pattern Recognit..

[8]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[9]  Wei-Ying Ma,et al.  Multi-model similarity propagation and its application for web image retrieval , 2004, MULTIMEDIA '04.

[10]  Liang-Tien Chia,et al.  Cross-media retrieval using query dependent search methods , 2010, Pattern Recognit..

[11]  Zhiwu Lu,et al.  Image annotation by semantic sparse recoding of visual content , 2012, ACM Multimedia.

[12]  Katsumi Tanaka,et al.  Complementary information retrieval for cross-media news content , 2004, MMDB '04.

[13]  Yueting Zhuang,et al.  Cross-modal correlation learning for clustering on image-audio dataset , 2007, ACM Multimedia.

[14]  Zemin Xi,et al.  Sensor Management for Target Search with Unknown Detection Performance , 2012, MMSP 2012.