Correlation-based Feature Analysis and Multi-Modality Fusion framework for multimedia semantic retrieval

In this paper, we propose a Correlation based Feature Analysis (CFA) and Multi-Modality Fusion (CFA-MMF) framework for multimedia semantic concept retrieval. The CFA method is able to reduce the feature space and capture the correlation between features, separating the feature set into different feature groups, called Hidden Coherent Feature Groups (HCFGs), based on Maximum Spanning Tree (MaxST) algorithm. A correlation matrix is built upon feature pair correlations, and then a MaxST is constructed based on the correlation matrix. By performing a graph cut procedure on the MaxST, a set of feature groups are obtained, where the intra-group correlation is maximized and the inter-group correlation is minimized. Finally, one classifier is trained for each of the feature groups, and the generated scores from different classifiers are fused for the final retrieval. The proposed framework is effective because it reduces the dimensionality of the feature space. The experimental results on the NUSWIDE-Lite data set demonstrate the effectiveness of the proposed CFA-MMF framework.

[1]  Muhammad Hussain,et al.  Feature Subset Selection for Network Intrusion Detection Mechanism Using Genetic Eigen Vectors , .

[2]  Gede Putra Kusuma,et al.  Recombination of 2D and 3D Images for Multimodal 2D + 3D Face Recognition , 2010, 2010 Fourth Pacific-Rim Symposium on Image and Video Technology.

[3]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[4]  Chao Chen,et al.  Web media semantic concept retrieval via tag removal and model fusion , 2013, ACM Trans. Intell. Syst. Technol..

[5]  Guohua Geng,et al.  Linear Transformation Technology for Image Feature Drop Dimension , 2011, 2011 Fourth International Symposium on Knowledge Acquisition and Modeling.

[6]  Emmanuel Dellandréa,et al.  Visual object categorization based on the fusion of region and local features , 2010, Stud. Inform. Univ..

[7]  Pankaj K. Agarwal,et al.  Farthest Neighbors, Maximum Spanning Trees and Related Problems in Higher Dimensions , 1991, Comput. Geom..

[8]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Shuicheng Yan,et al.  Efficient large-scale image annotation by probabilistic collaborative multi-label propagation , 2010, ACM Multimedia.

[11]  Jeff A. Bilmes,et al.  Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification , 2009, NIPS.

[12]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[13]  Paris Smaragdis,et al.  AUDIO/VISUAL INDEPENDENT COMPONENTS , 2003 .

[14]  Gerhard Rigoll,et al.  Late fusion for person detection in camera networks , 2011, CVPR 2011 WORKSHOPS.

[15]  Richard O. Duda,et al.  Pattern Classification by Iteratively Determined Linear and Piecewise Linear Discriminant Functions , 1966, IEEE Trans. Electron. Comput..

[16]  Emmanuel Dellandréa,et al.  A Selective Weighted Late Fusion for Visual Concept Recognition , 2012, ECCV Workshops.

[17]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2006, IEEE Transactions on Knowledge and Data Engineering.

[18]  Harun Uguz,et al.  A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm , 2011, Knowl. Based Syst..

[19]  R. Prim Shortest connection networks and some generalizations , 1957 .