Video diver: generic video indexing with diverse features

Semantic video indexing is critical for practical video retrieval systems and a generic and scalable indexing framework is a must for indexing a large semantic lexicon with over 1000 concepts present. This paper fully explores the idea of incorporating many kinds of diverse features into a single framework, combining them altogether to obtain larger degree of invariance which is absent in any of the component features, and thus achieves genericness and scalability. We scale down the formidable computational expense with a clever design of the classification and fusion schemes. To be specific, ~20 kinds of diverse features are extracted to capture limited yet complementary variance in color, texture and edge with spatial constraints implicitly integrated, and over 100 classifiers are built subsequently and fused to produce a generic detector. The extensive experiments on a total of 310 hours of TRECVID news videos show that the proposed framework yields significantly improved performance over that of the best single feature across a variety of concepts. Moreover, a benchmark comparison demonstrates that this approach is state-of-the-art. Meanwhile, the proposed approach generalizes well over previously unseen programs and stations and scales well to a lexicon of over 300 concepts in the LSCOM [18] ontology.

[1]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Dong Wang,et al.  The feature and spatial covariant kernel: adding implicit spatial constraints to histogram , 2007, CIVR '07.

[3]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[4]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[5]  Dong Wang,et al.  Video search in concept subspace: a text-like paradigm , 2007, CIVR '07.

[6]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[7]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[8]  Jianping Fan,et al.  ClassView: hierarchical video shot classification, indexing, and accessing , 2004, IEEE Transactions on Multimedia.

[9]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[10]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[11]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[12]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  Shih-Fu Chang,et al.  Context-Based Concept Fusion with Boosted Conditional Random Fields , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[15]  Rong Yan,et al.  Multi-Lingual Broadcast News Retrieval , 2006, TRECVID.

[16]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[17]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[19]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[20]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[21]  Jun Yang,et al.  CMU Informedia's TRECVID 2005 Skirmishes , 2005, TRECVID.

[22]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[23]  Dong Wang,et al.  Relay Boost Fusion for Learning Rare Concepts in Multimedia , 2006, CIVR.

[24]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[25]  S. Lazebnik,et al.  Local Features and Kernels for Classification of Texture and Object Categories: An In-Depth Study , 2005 .

[26]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[27]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[28]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.