A three-level architecture for bridging the image semantic gap

Image retrieval systems face the problem of dealing with the different ways to apprehend the content of images and in particular the difficulty to characterize the visual semantics. To address this issue, we examine the use of three abstract levels of representation, namely Signal, Object and Semantic. At the Signal Level, we propose a framework mapping the extracted low-level features to symbolic signal descriptors. The Object Level features a statistical model considering the joint distribution of object concepts (such as mountains, sky…) and the symbolic signal descriptors. At the Semantic Level, signal and object characterizations are coupled within a logic-based framework. The latter is instantiated by a knowledge representation formalism allowing to define an expressive query language consisting of several boolean and quantification operators. Our architecture therefore makes it possible to process topic-based queries. Experimentally, we evaluate our theoretical proposition on a corpus of real-world photographs and the TRECVid corpus.

[1]  Bin Wang,et al.  Dual cross-media relevance model for image annotation , 2007, ACM Multimedia.

[2]  Mohan S. Kankanhalli,et al.  Advances in Digital Home Image Albums , 2003 .

[3]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[4]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Yves Chiaramella,et al.  A Model for Multimedia Information Retrieval , 1996 .

[6]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[7]  Yihong Gong,et al.  Image indexing and retrieval based on color histograms , 1996, Multimedia Tools and Applications.

[8]  Aleksandra Mojsilovic,et al.  Capturing image semantics with low-level descriptors , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[9]  Joo-Hwee Lim,et al.  A structured learning framework for content-based image indexing and visual query , 2005, Multimedia Systems.

[10]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[12]  Mourad Mechkour,et al.  EMIR2: An Extended Model for Image Representation and Retrieval , 1995, DEXA.

[13]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  P. Newcomer,et al.  Basic Color Terms , 1971, International Journal of American Linguistics.

[15]  Djoerd Hiemstra,et al.  Probabilistic Approaches to Video Retrieval , 2004, TRECVID.

[16]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[17]  A. Ravishankar Rao,et al.  The Texture Lexicon: Understanding the Categorization of Visual Texture Terms and Their Relationship to Texture Images , 1997, Cogn. Sci..

[18]  Dan I. Moldovan,et al.  Exploiting ontologies for automatic image annotation , 2005, SIGIR '05.

[19]  Ben Bradshaw,et al.  Semantic based image retrieval: a probabilistic approach , 2000, ACM Multimedia.

[20]  P. Kay Basic Color Terms: Their Universality and Evolution , 1969 .

[21]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..