Classifier-specific intermediate representation for multimedia tasks

Video annotation and multimedia classification play important roles in many applications such as video indexing and retrieval. To improve video annotation and event detection, researchers have proposed using intermediate concept classifiers with concept lexica to help understand the videos. Yet it is difficult to judge how many and what concepts would be sufficient for the particular video analysis task. Additionally, obtaining robust semantic concept classifiers requires a large number of positive training examples, which in turn has high human annotation cost. In this paper, we propose an approach that is able to automatically learn an intermediate representation from video features together with a classifier. The joint optimization of the two components makes them mutually beneficial and reciprocal. Effectively, the intermediate representation and the classifier are tightly correlated. The classifier dependent intermediate representation not only accurately reflects the task semantics but is also more suitable for the specific classifier. Thus we have created a discriminative semantic analysis framework based on a tightly-coupled intermediate representation. Several experiments on video annotation and multimedia event detection using real-world videos demonstrate the effectiveness of the proposed approach.

[1]  Rong Jin,et al.  A kernel density based approach for large scale image retrieval , 2011, ICMR.

[2]  Nicu Sebe,et al.  Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection , 2012, IEEE Transactions on Multimedia.

[3]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Rong Yan,et al.  Video Retrieval Based on Semantic Concepts , 2008, Proceedings of the IEEE.

[5]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[6]  Li Li,et al.  A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7]  Feiping Nie,et al.  A general kernelization framework for learning algorithms based on kernel PCA , 2010, Neurocomputing.

[8]  Qi Tian,et al.  Discriminant Subspace Analysis: An Adaptive Approach for Image Classification , 2009, IEEE Transactions on Multimedia.

[9]  Meng Wang,et al.  Optimizing multi-graph learning: towards a unified video annotation scheme , 2007, ACM Multimedia.

[10]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[11]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[13]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[14]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Min Chen,et al.  Video Semantic Event/Concept Detection Using a Subspace-Based Multimedia Data Mining Framework , 2008, IEEE Transactions on Multimedia.

[16]  Wei Liu,et al.  Double Fusion for Multimedia Event Detection , 2012, MMM.

[17]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Rong Jin,et al.  An efficient key point quantization algorithm for large scale image retrieval , 2009, LS-MMRM '09.

[19]  Pradeep Natarajan,et al.  Efficient Orthogonal Matching Pursuit using sparse random projections for scene and video classification , 2011, 2011 International Conference on Computer Vision.

[20]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[21]  Vincent S. Tseng,et al.  Integrated Mining of Visual Features, Speech Features, and Frequent Patterns for Semantic Video Annotation , 2008, IEEE Transactions on Multimedia.

[22]  Nuno Vasconcelos,et al.  TaylorBoost: First and second-order boosting algorithms with explicit margin control , 2011, CVPR 2011.

[23]  Ming-Syan Chen,et al.  Association and Temporal Rule Mining for Post-Filtering of Semantic Concept Detection in Video , 2008, IEEE Transactions on Multimedia.

[24]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[25]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[26]  Yi Yang,et al.  Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[27]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[28]  Jiawei Han,et al.  Spectral regression: a unified subspace learning framework for content-based image retrieval , 2007, ACM Multimedia.

[29]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[30]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[31]  Ming-yu Chen,et al.  Long Term Activity Analysis in Surveillance Video Archives , 2010 .

[32]  B. S. Manjunath,et al.  Video Annotation Through Search and Graph Reinforcement Mining , 2010, IEEE Transactions on Multimedia.

[33]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.