Image and Video Indexing Using Networks of Operators

This article presents a framework for the design of concept detection systems for image and video indexing. This framework integrates in a homogeneous way all the data and processing types. The semantic gap is crossed in a number of steps, each producing a small increase in the abstraction level of the handled data. All the data inside the semantic gap and on both sides included are seen as a homogeneous type called numcept and all the processing modules between the various numcepts are seen as a homogeneous type called operator. Concepts are extracted from the raw signal using networks of operators operating on numcepts. These networks can be represented as data-flow graphs and the introduced homogenizations allow fusing elements regardless of their nature. Low-level descriptors can be fused with intermediate of final concepts. This framework has been used to build a variety of indexing networks for images and videos and to evaluate many aspects of them. Using annotated corpora and protocols of the 2003 to 2006 TRECVID evaluation campaigns, the benefit brought by the use of individual features, the use of several modalities, the use of various fusion strategies, and the use of topologic and conceptual contexts was measured. The framework proved its efficiency for the design and evaluation of a series of network architectures while factorizing the training effort for common sub-networks.

[1]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[2]  Paul Over,et al.  TRECVID 2005 - An Overview , 2005, TRECVID.

[3]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[4]  Harriet J. Nock,et al.  Discriminative model fusion for semantic concept detection and annotation in video , 2003, ACM Multimedia.

[5]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[6]  John W. Backus,et al.  Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs , 1978, CACM.

[7]  Jeff Z. Pan,et al.  Multimedia annotations on the semantic Web , 2006, IEEE Multimedia.

[8]  Stéphane Ayache,et al.  Context-Based Conceptual Image Indexing , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Stéphane Ayache,et al.  Using Topic Concepts for Semantic Video Shots Classification , 2006, CIVR.

[10]  G. Quénot,et al.  CLIPS-LSR Experiments at TRECVID 2006 , 2006, TRECVID.

[11]  Harriet J. Nock,et al.  Semantic indexing of multimedia using audio, text and visual cues , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[12]  Sheng Tang,et al.  TRECVID 2006 by NUS-I2R , 2006, TRECVID.

[13]  Martial Hebert,et al.  Discriminative random fields: a discriminative framework for contextual interaction in classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[15]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[16]  QU GeorgesM Computation of Optical Flow using Dynamic Programming , 1996 .

[17]  Matthieu Cord,et al.  A comparison of active classification methods for content-based image retrieval , 2004, CVDB '04.

[18]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[20]  Tobun Dorbin Ng,et al.  Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.

[21]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[22]  Marcel Worring,et al.  Learning rich semantics from news video archives by style analysis , 2006, TOMCCAP.

[23]  Stéphane Ayache,et al.  Classifier Fusion for SVM-Based Multimedia Semantic Indexing , 2007, ECIR.

[24]  Ching-Yung Lin,et al.  Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets , 2003, TRECVID.

[25]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[26]  Bertrand Zavidovique,et al.  Massively parallel data flow computer dedicated to real-time image processing , 1997 .

[27]  Milind R. Naphade On supervision and statistical learning for semantic multimedia analysis , 2004, J. Vis. Commun. Image Represent..