Study on the combination of video concept detectors

This paper studies the combination of video concept detectors with a labeled fusion set. We point out that the computational cost of the grid search for fusion weights increases exponentially with the number of detectors, and it is thus infeasible when dealing with a large number of detectors. To avoid the difficulty, we adopt incremental fusion approach, i.e., in each round two detectors are combined and hence only 1-dimensional grid search is needed. We propose a Bottom-Up Incremental Fusion (BUIF) method which keeps selecting the detectors with lowest performance for combination. We conduct experiments on TRECVID benchmark dataset for 39 concepts with 38 detection methods. Ten different fusion strategies are compared, and empirical results have demonstrated the superiority of the proposed incremental fusion approach.

[1]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[2]  Dong Wang,et al.  Video diver: generic video indexing with diverse features , 2007, MIR '07.

[3]  Shih-Fu Chang,et al.  Context-Based Concept Fusion with Boosted Conditional Random Fields , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Rong Yan,et al.  The combination limit in multimedia retrieval , 2003, MULTIMEDIA '03.

[5]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[6]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[7]  Harriet J. Nock,et al.  Discriminative model fusion for semantic concept detection and annotation in video , 2003, ACM Multimedia.

[8]  Meng Wang,et al.  Automatic video annotation by semi-supervised learning with kernel density estimation , 2006, MM '06.

[9]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[10]  Dong Wang,et al.  AP-Based Borda Voting Method for Feature Extraction in TRECVID-2004 , 2005, ECIR.

[11]  Bo Zhang,et al.  Probabilistic model supported rank aggregation for the semantic concept detection in video , 2007, CIVR '07.

[12]  Meng Wang,et al.  Video annotation by graph-based learning with neighborhood similarity , 2007, ACM Multimedia.

[13]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.