论文信息 - Multi-layer multi-instance kernel for video concept detection

Multi-layer multi-instance kernel for video concept detection

In video concept detection, most existing methods have not well studied the intrinsic hierarchical structure of video content. However, unlike flat attribute-value data used in many existing methods, video is essentially a structured media with multi-layer representation. For example, a video can be represented by a hierarchical structure including, from large to small, shot, key-frame, and region. Moreover, it fits the typical Multi-Instance (MI) setting in which the "bag-instance" correspondence is embedded among contiguous layers. We call such multi-layer structure and the "bag-instance" relation embedded in the structure as Multi-Layer Multi-Instance (MLMI) setting in this paper. We formulate video concept detection as an MLMI learning problem in which a rooted tree with MLMI nature embedded is devised to represent a video segment. Furthermore, by fusing the information from different layers, we construct a novel MLMI kernel to measure the similarities between the instances in the same and different layers. In contrast to traditional MI learning, both the Multi-Layer structure and Multi-Instance relations are leveraged simultaneously in the proposed kernel. We applied MLMI kernel to concept detection task on TRECVID 2005 corpus and reported superior performance (+25% in Mean Average Precision) to standard Support Vector Machine based approaches.

Tao Mei | Xian-Sheng Hua | Jinhui Tang | Xiuqing Wu | Zhiwei Gu

[1] Sanjeev Khudanpur,et al. Hidden Markov models for automatic annotation and content-based retrieval of images and video , 2005, SIGIR '05.

[2] Milind R. Naphade,et al. A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[3] Mikhail Belkin,et al. Maximum Margin Semi-Supervised Learning for Structured Variables , 2005, NIPS 2005.

[4] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5] Xuelong Li,et al. Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm , 2006, IEEE Transactions on Multimedia.

[6] Thomas Gärtner,et al. Multi-Instance Kernels , 2002, ICML.

[7] Marcel Worring,et al. The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] James T. Kwok,et al. A regularization framework for multiple-instance learning , 2006, ICML.

[9] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.

[10] Yixin Chen,et al. MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Rong Yan,et al. Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12] Hisashi Kashima,et al. Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[13] Rong Yan,et al. Model-shared subspace boosting for multi-label classification , 2007, KDD '07.

[14] James T. Kwok,et al. Marginalized Multi-Instance Kernels , 2007, IJCAI.

[15] Thomas Hofmann,et al. Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[16] John R. Smith,et al. IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[17] R. Manmatha,et al. Multiple Bernoulli relevance models for image and video annotation , 2004, CVPR 2004.

[18] Marcel Worring,et al. The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[19] Ajay Divakaran,et al. Framework for measurement of the intensity of motion activity of video segments , 2004, J. Vis. Commun. Image Represent..

[20] Yixin Chen,et al. Image Categorization by Learning and Reasoning with Regions , 2004, J. Mach. Learn. Res..

[21] Thomas G. Dietterich,et al. Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[22] David Haussler,et al. Convolution kernels on discrete structures , 1999 .

[23] Tao Mei,et al. MILC2: A Multi-Layer Multi-Instance Learning Approach to Video Concept Detection , 2008, MMM.

[24] Michael Collins,et al. Convolution Kernels for Natural Language , 2001, NIPS.

[25] Oded Maron,et al. Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[26] B. S. Manjunath,et al. Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[27] Chong-Wah Ngo,et al. Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[28] John R. Smith,et al. Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[29] Thomas Gärtner,et al. A survey of kernels for structured data , 2003, SKDD.

[30] John R. Smith,et al. A generalized multiple instance learning algorithm for large scale modeling of multimedia semantics , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..