On the surplus value of semantic video analysis beyond the key frame

Typical semantic video analysis methods aim for classification of camera shots based on extracted features from a single keyframe only. In this paper, we sketch a video analysis scenario and evaluate the benefit of analysis beyond the key frame for semantic concept detection performance. We developed detectors for a lexicon of 26 concepts, and evaluated their performance on 120 hours of video data. Results show that, on average, detection performance can increase with almost 40% when the analysis method takes more visual content into account.

[1]  Nevenka Dimitrova,et al.  Video keyframe extraction and filtering: a keyframe is not a keyframe to everyone , 1997, CIKM '97.

[2]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[3]  Tobun Dorbin Ng,et al.  Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.

[4]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[5]  Georges Quénot,et al.  CLIPS at TREC 11: Experiments in Video Retrieval , 2002, TREC.

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Milind R. Naphade On supervision and statistical learning for semantic multimedia analysis , 2004, J. Vis. Commun. Image Represent..

[8]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Marcel Worring,et al.  User transparent parallel processing of the 2004 NIST TRECVID data set , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[10]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[11]  Arnold W. M. Smeulders,et al.  Color Invariance , 2001, IEEE Trans. Pattern Anal. Mach. Intell..