Content-Based Multimedia Retrieval Using Feature Correlation Clustering and Fusion

Nowadays, only processing visual features is not enough for multimedia semantic retrieval due to the complexity of multimedia data, which usually involve a variety of modalities, e.g. graphics, text, speech, video, etc. It becomes crucial to fully utilize the correlation between each feature and the target concept, the feature correlation within modalities, and the feature correlation across modalities. In this paper, the authors propose a Feature Correlation Clustering-based Multi-Modality Fusion Framework FCC-MMF for multimedia semantic retrieval. Features from different modalities are combined into one feature set with the same representation via a normalization and discretization process. Within and across modalities, multiple correspondence analysis is utilized to obtain the correlation between feature-value pairs, which are then projected onto the two principal components. K-medoids algorithm, which is a widely used partitioned clustering algorithm, is selected to minimize the Euclidean distance within the resulted clusters and produce high intra-correlated feature-value pair clusters. Majority vote is applied to subsequently decide which cluster each feature belongs to. Once the feature clusters are formed, one classifier is built and trained for each cluster. The correlation and confidence of each classifier are considered while fusing the classification scores, and mean average precision is used to evaluate the final ranked classification scores. Finally, the proposed framework is applied on NUS-wide Lite data set to demonstrate the effectiveness in multimedia semantic retrieval.

[1]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[2]  Nasrollah Moghaddam Charkari,et al.  Multimodal information fusion application to human emotion recognition from face and speech , 2010, Multimedia Tools and Applications.

[3]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[4]  Teruko Mitamura,et al.  Multimedia event detection using visual concept signatures , 2013, Electronic Imaging.

[5]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[6]  Umer Rashid,et al.  Fusion of Multimedia Document Intra-Modality Relevancies using Linear Combination Model , 2008, SCSS.

[7]  Roberto Tronci,et al.  Performance Evaluation of Relevance Feedback for Image Retrieval by "Real-World" Multi-Tagged Image Datasets , 2012, Int. J. Multim. Data Eng. Manag..

[8]  Shu-Ching Chen,et al.  Correlation-based Feature Analysis and Multi-Modality Fusion framework for multimedia semantic retrieval , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[9]  Bir Bhanu,et al.  Tracking Humans using Multi-modal Fusion , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[10]  Stéphane Marchand-Maillet,et al.  Information Fusion in Multimedia Information Retrieval , 2007, Adaptive Multimedia Retrieval.

[11]  Yuxiao Hu,et al.  Audio-Visual Spontaneous Emotion Recognition , 2007, Artifical Intelligence for Human Computing.

[12]  David Zhang,et al.  When Faces Are Combined with Palmprints: A Novel Biometric Fusion Strategy , 2004, ICBA.

[13]  Fayzur Rahman,et al.  Hybrid Feature and Decision Fusion Based Audio-Visual Speaker Identification in Challenging Environment , 2010 .

[14]  Pingzhi Fan,et al.  Performance evaluation of score level fusion in multimodal biometric systems , 2010, Pattern Recognit..

[15]  Gérard Chollet,et al.  Audio-Visual Speech Synchrony Measure for Talking-Face Identity Verification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  Bailing Zhang,et al.  Multiple features facial image retrieval by spectral regression and fuzzy aggregation approach , 2011, Int. J. Intell. Comput. Cybern..

[17]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[18]  Baoqing Jiang,et al.  Cross-Media Retrieval Method Based on Temporal-spatial Clustering and Multimodal Fusion , 2009, 2009 Fourth International Conference on Internet Computing for Science and Engineering.

[19]  Alessandro Moschitti,et al.  Supervised models for multimodal image retrieval based on visual, semantic and geographic information , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[20]  Alexander G. Hauptmann,et al.  Leveraging high-level and low-level features for multimedia event detection , 2012, ACM Multimedia.

[21]  Angeliki Metallinou,et al.  Decision level combination of multiple modalities for recognition and analysis of emotional expression , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Chao Chen,et al.  Web media semantic concept retrieval via tag removal and model fusion , 2013, ACM Trans. Intell. Syst. Technol..

[23]  Mubarak Shah,et al.  Columbia-UCF TRECVID2010 Multimedia Event Detection: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching , 2010, TRECVID.

[24]  Pankaj K. Agarwal,et al.  Farthest Neighbors, Maximum Spanning Trees and Related Problems in Higher Dimensions , 1991, Comput. Geom..

[25]  Mei-Ling Shyu,et al.  Weighted Association Rule Mining for Video Semantic Detection , 2010, Int. J. Multim. Data Eng. Manag..

[26]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[27]  Christian Breiteneder,et al.  Towards a Taxonomy of Display Styles for Ubiquitious Multimedia , 2008 .

[28]  Mei-Ling Shyu,et al.  Effective and Efficient Video High-Level Semantic Retrieval Using Associations and Correlations , 2009, Int. J. Semantic Comput..

[29]  Sukhendu Das,et al.  A Survey of Decision Fusion and Feature Fusion Strategies for Pattern Classification , 2010, IETE Technical Review.

[30]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[31]  Ebroul Izquierdo,et al.  Multi-feature fusion for surveillance video indexing , 2011, WIAMIS 2011.

[32]  Alan F. Smeaton,et al.  A Comparison of Score, Rank and Probability-Based Fusion Methods for Video Shot Retrieval , 2005, CIVR.

[33]  Björn W. Schuller,et al.  Low-Level Fusion of Audio, Video Feature for Multi-Modal Emotion Recognition , 2008, VISAPP.

[34]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[35]  Zhiqiang Zhang,et al.  A Novel Hierarchical Information Fusion Method for Three-Dimensional Upper Limb Motion Estimation , 2011, IEEE Transactions on Instrumentation and Measurement.

[36]  Anthony K. H. Tung,et al.  Multiple feature fusion for social media applications , 2010, SIGMOD Conference.

[37]  Dong Liu,et al.  Joint audio-visual bi-modal codewords for video event detection , 2012, ICMR.

[38]  Shu-Ching Chen,et al.  Video Semantic Concept Discovery using Multimodal-Based Association Classification , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[39]  Zhihong Zeng,et al.  Audio–Visual Affective Expression Recognition Through Multistream Fused HMM , 2008, IEEE Transactions on Multimedia.

[40]  Long Lan,et al.  Sparse Representation Based Discriminative Canonical Correlation Analysis for Face Recognition , 2012, 2012 11th International Conference on Machine Learning and Applications.

[41]  Stephan Gerlach,et al.  2D Audio-Visual Localization in Home Environments using a Particle Filter , 2012, ITG Conference on Speech Communication.

[42]  Ian Witten,et al.  Data Mining , 2000 .

[43]  Chong-Wah Ngo,et al.  Concept-Driven Multi-Modality Fusion for Video Search , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[44]  Yuichi Motai,et al.  Human tracking from a mobile agent: Optical flow and Kalman filter arbitration , 2012, Signal Process. Image Commun..

[45]  Shu-Ching Chen,et al.  Feature Selection Using Correlation and Reliability Based Scoring Metric for Video Semantic Detection , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[46]  S. Sumathi,et al.  Application of Artificial Bee Colony Optimization Algorithm for Image Classification Using Color and Texture Feature Similarity Fusion , 2012 .

[47]  Yongdong Zhang,et al.  Multimedia Evidence Fusion for Video Concept Detection via OWA Operator , 2009, MMM.

[48]  Petros Maragos,et al.  Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition , 2009, IEEE Trans. Speech Audio Process..

[49]  Shu-Ching Chen,et al.  Effective supervised discretization for classification based on correlation maximization , 2011, 2011 IEEE International Conference on Information Reuse & Integration.

[50]  Shuicheng Yan,et al.  Efficient large-scale image annotation by probabilistic collaborative multi-label propagation , 2010, ACM Multimedia.

[51]  Jeff A. Bilmes,et al.  Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification , 2009, NIPS.

[52]  Mei-Ling Shyu,et al.  Correlation-Based Ranking for Large-Scale Video Concept Retrieval , 2010, Int. J. Multim. Data Eng. Manag..

[53]  Gabriela Csurka,et al.  Semantic combination of textual and visual information in multimedia retrieval , 2011, ICMR.

[54]  Sabine Graf,et al.  Adaptable and Adaptive Hypermedia Systems , 2006, J. Educ. Technol. Soc..

[55]  Hatice Gunes,et al.  Audio-Visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space , 2010, 2010 20th International Conference on Pattern Recognition.

[56]  Gregory A. Clark,et al.  Sensor feature fusion for detecting buried objects , 1993, Defense, Security, and Sensing.

[57]  Gerald Friedland,et al.  Acoustic super models for large scale video event detection , 2011, J-MRE '11.

[58]  Sandy L. Klemm,et al.  Single-Cell Expression Analyses during Cellular Reprogramming Reveal an Early Stochastic and a Late Hierarchic Phase , 2012, Cell.

[59]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2008, IEEE Trans. Knowl. Data Eng..

[60]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[61]  Ling Guan,et al.  Multimodal Information Fusion of Audio Emotion Recognition Based on Kernel Entropy Component Analysis , 2012, 2012 IEEE International Symposium on Multimedia.

[62]  Chao Chen,et al.  Weighted Subspace Filtering and Ranking Algorithms for Video Concept Retrieval , 2011, IEEE MultiMedia.

[63]  Chang-Tsun Li Digital Watermarking Schemes for Multimedia Authentication , 2008 .

[64]  Shu-Ching Chen,et al.  Correlation-Based Video Semantic Concept Detection Using Multiple Correspondence Analysis , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[65]  Chang-Tsun Li,et al.  Digital Watermarking for Multimedia Security Management , 2005 .

[66]  Te-Won Lee Independent Component Analysis , 1998, Springer US.

[67]  Friedhelm Schwenker,et al.  Conditioned Hidden Markov Model Fusion for Multimodal Classification , 2011, INTERSPEECH.

[68]  Jianping Fan,et al.  Evidence-based SVM fusion for 3D model retrieval , 2013, Multimedia Tools and Applications.

[69]  Bir Bhanu,et al.  Feature Level Fusion of Face and Gait at a Distance , 2010 .

[70]  Rong Yan,et al.  Learning query-class dependent weights in automatic video retrieval , 2004, MULTIMEDIA '04.

[71]  Guna Seetharaman,et al.  Feature fusion using ranking for object tracking in aerial imagery , 2012 .

[72]  Chun Chen,et al.  Audio-visual based emotion recognition using tripled hidden Markov model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[73]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[74]  Dimitrios Tzovaras,et al.  A novel framework for retrieval and interactive visualization of multimodal data , 2013 .

[75]  Mau-Tsuen Yang,et al.  A multimodal fusion system for people detection and tracking , 2005, Int. J. Imaging Syst. Technol..

[76]  Xiaohua Zhai,et al.  Cross-modality correlation propagation for cross-media retrieval , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[77]  Henning Müller,et al.  Information Fusion for Combining Visual and Textual Image Retrieval , 2010, 2010 20th International Conference on Pattern Recognition.

[78]  ChuanChing-Hua Audio Classification and Retrieval Using Wavelets and Gaussian Mixture Models , 2013 .

[79]  Hong Zhang,et al.  Cross-Media Semantics Mining Based on Sparse Canonical Correlation Analysis and Relevance Feedback , 2012, PCM.

[80]  Wei Liu,et al.  Double Fusion for Multimedia Event Detection , 2012, MMM.

[81]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[82]  Stéphane Ayache,et al.  Classifier Fusion for SVM-Based Multimedia Semantic Indexing , 2007, ECIR.

[83]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[84]  M. Greenacre,et al.  Multiple Correspondence Analysis and Related Methods , 2006 .

[85]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[86]  David A. Clausi,et al.  Design-based texture feature fusion using Gabor filters and co-occurrence probabilities , 2005, IEEE Transactions on Image Processing.

[87]  Björn W. Schuller,et al.  Speaker Independent Speech Emotion Recognition by Ensemble Classification , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[88]  ChenShu-Ching,et al.  Content-Based Multimedia Retrieval Using Feature Correlation Clustering and Fusion , 2013 .

[89]  John R. Smith,et al.  Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues , 2003, EURASIP J. Adv. Signal Process..

[90]  Shu-Ching Chen,et al.  Correlation-based interestingness measure for video semantic concept detection , 2009, 2009 IEEE International Conference on Information Reuse & Integration.

[91]  Bakkama Srinath Reddy,et al.  Evidential Reasoning for Multimodal Fusion in Human Computer Interaction , 2007 .

[92]  Gang Wu,et al.  Multispectral Palmprint Recognition by Feature Level Fusion , 2012 .

[93]  Anil K. Jain,et al.  Multibiometric Cryptosystems Based on Feature-Level Fusion , 2012, IEEE Transactions on Information Forensics and Security.