Correlation-Based Deep Learning for Multimedia Semantic Concept Detection

Nowadays, concept detection from multimedia data is considered as an emerging topic due to its applicability to various applications in both academia and industry. However, there are some inevitable challenges including the high volume and variety of multimedia data as well as its skewed distribution. To cope with these challenges, in this paper, a novel framework is proposed to integrate two correlation-based methods, Feature-Correlation Maximum Spanning Tree (FC-MST) and Negative-based Sampling (NS), with a well-known deep learning algorithm called Convolutional Neural Network (CNN). First, FC-MST is introduced to select the most relevant low-level features, which are extracted from multiple modalities, and to decide the input layer dimension of the CNN. Second, NS is adopted to improve the batch sampling in the CNN. Using NUS-WIDE image data set as a web-based application, the experimental results demonstrate the effectiveness of the proposed framework for semantic concept detection, comparing to other well-known classifiers.

[1]  Shu-Ching Chen,et al.  Content-Based Multimedia Retrieval Using Feature Correlation Clustering and Fusion , 2013, Int. J. Multim. Data Eng. Manag..

[2]  Min Chen,et al.  FC-MST: Feature correlation maximum spanning tree for multimedia concept classification , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).

[3]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Shenghuo Zhu,et al.  Deep Learning of Invariant Features via Simulated Fixations in Video , 2012, NIPS.

[5]  Xin Huang,et al.  User Concept Pattern Discovery Using Relevance Feedback And Multiple Instance Learning For Content-Based Image Retrieval , 2002, MDM/KDD.

[6]  Chao Chen,et al.  Weighted Subspace Filtering and Ranking Algorithms for Video Concept Retrieval , 2011, IEEE MultiMedia.

[7]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[8]  Mubarak Shah,et al.  Complex Events Detection Using Data-Driven Concepts , 2012, ECCV.

[9]  Choochart Haruechaiyasak,et al.  Mining user access behavior on the WWW , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[10]  Xiuqi Li,et al.  An effective content-based visual image retrieval system , 2002, Proceedings 26th Annual International Computer Software and Applications.

[11]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[13]  Shu-Ching Chen,et al.  Negative-Based Sampling for Multimedia Retrieval , 2015, 2015 IEEE International Conference on Information Reuse and Integration.

[14]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Rangasami L. Kashyap,et al.  Augmented transition networks as video browsing models for multimedia databases and multimedia information systems , 1999, Proceedings 11th International Conference on Tools with Artificial Intelligence.

[16]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[17]  Mei-Ling Shyu,et al.  Leveraging Concept Association Network for Multimedia Rare Concept Mining and Retrieval , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[18]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[19]  Alberto Del Bimbo,et al.  Event detection and recognition for semantic annotation of video , 2010, Multimedia Tools and Applications.

[20]  Chao Chen,et al.  Web media semantic concept retrieval via tag removal and model fusion , 2013, ACM Trans. Intell. Syst. Technol..

[21]  Mei-Ling Shyu,et al.  Temporal Multiple Correspondence Analysis for Big Data Mining in Soccer Videos , 2015, 2015 IEEE International Conference on Multimedia Big Data.

[22]  Shu-Ching Chen,et al.  Network intrusion detection through Adaptive Sub-Eigenspace Modeling in multiagent systems , 2007, ACM Trans. Auton. Adapt. Syst..

[23]  Shu-Ching Chen,et al.  Feature Selection Using Correlation and Reliability Based Scoring Metric for Video Semantic Detection , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[24]  Min Chen,et al.  Image database retrieval utilizing affinity relationships , 2003, MMDB '03.

[25]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[26]  Shawn McCann,et al.  Object Detection using Convolutional Neural Networks , 2013 .

[27]  Hossein Mobahi,et al.  Deep learning from temporal coherence in video , 2009, ICML '09.

[28]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[29]  Rangasami L. Kashyap,et al.  Identifying Overlapped Objects for Video Indexing and Modeling in Multimedia Database Systems , 2001, Int. J. Artif. Intell. Tools.

[30]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[33]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[34]  Cees G. M. Snoek,et al.  The MediaMill at TRECVID 2013: : Searching concepts, Objects, Instances and events in video , 2013, TRECVID.

[35]  Min Chen,et al.  A latent semantic indexing based method for solving multiple instance learning problem in region-based image retrieval , 2005, Seventh IEEE International Symposium on Multimedia (ISM'05).

[36]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[37]  Rangasami L. Kashyap,et al.  Generalized Affinity-Based Association Rule Mining for Multimedia Database Queries , 2001, Knowledge and Information Systems.

[38]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.