Annotation of still images by multiple visual concepts

The automatic indexing of images and videos is a highly relevant and important research area in the field of multimedia information retrieval. The difficulty of this task is no longer something to prove. The majority of the efforts of the research community have been focused in the past on the detection of single concepts in images/videos, which is already a hard task. With the evolution of the information retrieval systems, users needs are more abstract, and lead to a larger number of words composing the queries. It is sensible to think about indexing multimedia documents by more than one concept, to help retrieval systems to answer such complex queries. Few studies addressed specifically the problem of detecting multiple concepts (multi-concept) in images and videos, most of them concern the detection of concept pairs. These studies showed that such challenge is even greater than the one of single concept detection. In this work, we address this problematic of mult-concept detection in still images. Two types of approaches are considered : 1) building models per multi-concept and 2) fusion of single concepts detectors. We conducted our evaluation on PASCAL VOC'12 collection regarding the detection of pairs and triplets of concepts. Our results show that the two types of approaches give globally comparable results, but they differ for specific kinds of pairs/triplets.

[1]  Georges Quénot,et al.  Evaluations of multi-learner approaches for concept indexing in video documents , 2010, RIAO.

[2]  Djoerd Hiemstra,et al.  A probabilistic ranking framework using unobservable binary events for video search , 2008, CIVR '08.

[3]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[4]  Marcel Worring,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Harvesting Social Images for Bi-Concept Search , 2022 .

[5]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[6]  Georges Quénot,et al.  Descriptor optimization for multimedia indexing and retrieval , 2013, Multimedia Tools and Applications.

[7]  Gang Wang,et al.  Joint learning of visual attributes, object classes and visual saliency , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Dong Wang,et al.  Video search in concept subspace: a text-like paradigm , 2007, CIVR '07.

[9]  Edward A. Fox,et al.  Research Contributions , 2014 .

[10]  Rong Yan,et al.  The combination limit in multimedia retrieval , 2003, MULTIMEDIA '03.

[11]  Georges Quénot,et al.  Quaero at TRECVID 2013: Semantic Indexing and Instance Search , 2013 .

[12]  Winston H. Hsu,et al.  Video Search and High-Level Feature Extraction , 2005 .

[13]  Chong-Wah Ngo,et al.  Concept-Driven Multi-Modality Fusion for Video Search , 2011, IEEE Transactions on Circuits and Systems for Video Technology.