Multi-granular detection of regional semantic concepts [video annotation]

A large number of interesting visual semantic concepts occur at a sub-frame granularity in images and occupy one or more regions at the sub-frame level. Detecting these concepts is a challenge due to segmentation imperfections. We propose multi-granular detection of visual concepts that have regional support. We build a single set of support vector machine based binary concept models from the training set with manually marked up regions. In this paper, we show that detection can be significantly improved by scoring these models over multiple granularities in the test set images, where the regions are automatically detected as a preprocessing step in detection. Using 27 regional semantic concepts from the NIST TRECVID 2003 common annotation lexicon and the corpus, we demonstrate that multi-granular detection leads to improvement in detection.

[1]  John R. Smith,et al.  A framework for moderate vocabulary semantic visual concept detection , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[2]  Ching-Yung Lin,et al.  Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets , 2003, TRECVID.

[3]  John R. Smith,et al.  Role of classifiers in multimedia content management , 2003, IS&T/SPIE Electronic Imaging.

[4]  J.R. Smith,et al.  Learning visual models of semantic concepts , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.