Marginalized multi-layer multi-instance kernel for video concept detection

Video concept detection has been extensively studied in recent years. Most of the existing video concept detection approaches have treated video as a flat data sequence. However, video is essentially a kind of media with hierarchical structure, including multiple layers (e.g., video shot, frame, and region) and multiple instance relationship embedded in each pair of contiguous layers. In this paper, we propose a novel kernel, termed marginalized multi-layer multi-instance (MarMLMI) kernel for video concept detection. Different from most existing methods, the proposed MarMLMI kernel exploits the hierarchical structure of video, i.e., both the multi-layer structure and the multi-instance relationship. Furthermore, the instance label ambiguity in multi-instance setting is addressed by using the technology of marginalized kernel. We perform video concept detection on a real-world video corpus: the TREC video retrieval evaluation (TRECVID) benchmark and compare the proposed MarMLMI kernel to representative existing approaches. The experimental results demonstrate the effectiveness of the proposed MarMLMI kernel.

[1]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[2]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[3]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  James T. Kwok,et al.  Marginalized Multi-Instance Kernels , 2007, IJCAI.

[5]  Meng Wang,et al.  Visual query suggestion , 2010, ACM Trans. Multim. Comput. Commun. Appl..

[6]  Tao Mei,et al.  Graph-based semi-supervised learning with multiple labels , 2009, J. Vis. Commun. Image Represent..

[7]  Xin Xu,et al.  Logistic Regression and Boosting for Labeled Bags of Instances , 2004, PAKDD.

[8]  Zi Huang,et al.  Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.

[9]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Meng Wang,et al.  Semi-supervised kernel density estimation for video annotation , 2009, Comput. Vis. Image Underst..

[11]  Kiyoshi Asai,et al.  Marginalized kernels for biological sequences , 2002, ISMB.

[12]  Yi Yang,et al.  Interactive Video Indexing With Statistical Active Learning , 2012, IEEE Transactions on Multimedia.

[13]  Xian-Sheng Hua,et al.  Video Annotation Based on Kernel Linear Neighborhood Propagation , 2008, IEEE Transactions on Multimedia.

[14]  Shih-Fu Chang,et al.  Short-term audio-visual atoms for generic video concept classification , 2009, ACM Multimedia.

[15]  WangMeng,et al.  Beyond distance measurement , 2009 .

[16]  Wen Gao,et al.  Sequence Multi-Labeling: A Unified Video Annotation Scheme With Spatial and Temporal Context , 2010, IEEE Transactions on Multimedia.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Tao Mei,et al.  Multi-Layer Multi-Instance Learning for Video Concept Detection , 2008, IEEE Transactions on Multimedia.

[19]  Meng Wang,et al.  Video annotation by graph-based learning with neighborhood similarity , 2007, ACM Multimedia.

[20]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[21]  Niklas Carlsson,et al.  Server selection in large-scale video-on-demand systems , 2010, TOMCCAP.

[22]  Tao Mei,et al.  Multi-layer multi-instance kernel for video concept detection , 2007, ACM Multimedia.

[23]  Meng Wang,et al.  MSRA atT TRECVID 2008: High-Level Feature Extraction and Automatic Search , 2008, TRECVID.

[24]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[27]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[28]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Xuelong Li,et al.  Modality Mixture Projections for Semantic Video Event Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[31]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.