Using Topic Concepts for Semantic Video Shots Classification

Automatic semantic classification of video databases is very useful for users searching and browsing but it is a very challenging research problem as well. Combination of visual and text modalities is one of the key issues to bridge the semantic gap between signal and semantic. In this paper, we propose to enhance the classification of high-level concepts using intermediate topic concepts and study various fusion strategies to combine topic concepts with visual features in order to outperform unimodal classifiers. We have conducted several experiments on the TRECVID'05 collection and show here that several intermediate topic classifiers can bridge parts of the semantic gap and help to detect high-level concepts.

[1]  Stéphane Ayache,et al.  CLIPS-LSR-NII Experiments at TRECVID 2005 ( DRAFT ) , .

[2]  Stéphane Ayache,et al.  Context-Based Conceptual Image Indexing , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[4]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[5]  Cordelia Schmid,et al.  Toward Category-Level Object Recognition , 2006, Toward Category-Level Object Recognition.

[6]  Antonio Torralba,et al.  Object Detection and Localization Using Local and Global Features , 2006, Toward Category-Level Object Recognition.

[7]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[8]  Thomas S. Huang,et al.  Fusion of global and local information for object detection , 2002, Object recognition supported by user interaction for service robots.

[9]  Stéphane Ayache,et al.  CLIPS-LSR-NII Experiments at TRECVID 2005 , 2005, TRECVID.

[10]  G. Quénot,et al.  CLIPS-LSR Experiments at TRECVID 2006 , 2006, TRECVID.

[11]  Matthew B. Blaschko,et al.  Combining Local and Global Image Features for Object Class Recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[12]  Harriet J. Nock,et al.  Discriminative model fusion for semantic concept detection and annotation in video , 2003, ACM Multimedia.

[13]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[14]  Harriet J. Nock,et al.  Semantic indexing of multimedia using audio, text and visual cues , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[15]  Alexander G. Hauptmann,et al.  LSCOM Lexicon Definitions and Annotations (Version 1.0) , 2006 .

[16]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[17]  Stéphane Ayache,et al.  Video Shot Classification Using Lexical Context , 2005, ECIR.

[18]  Milind R. Naphade On supervision and statistical learning for semantic multimedia analysis , 2004, J. Vis. Commun. Image Represent..