Mind the gap: another look at the problem of the semantic gap in image retrieval

This paper attempts to review and characterise the problem of the semantic gap in image retrieval and the attempts being made to bridge it. In particular, we draw from our own experience in user queries, automatic annotation and ontological techniques. The first section of the paper describes a characterisation of the semantic gap as a hierarchy between the raw media and full semantic understanding of the media's content. The second section discusses real users' queries with respect to the semantic gap. The final sections of the paper describe our own experience in attempting to bridge the semantic gap. In particular we discuss our work on auto-annotation and semantic-space models of image retrieval in order to bridge the gap from the bottom up, and the use of ontologies, which capture more semantics than keyword object labels alone, as a technique for bridging the gap from the top down.

[1]  Peter G. B. Enser,et al.  VIRAMI - Visual Information Retrieval for Archival Moving Imagery , 2001, ICHIM.

[2]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[3]  R. Manmatha,et al.  An Inference Network Approach to Image Retrieval , 2004, CIVR.

[4]  M. G. Strintzis,et al.  INTEGRATING KNOWLEDGE , SEMANTICS AND CONTENT FOR USER-CENTRED INTELLIGENT MEDIA SERVICES : THE ACEMEDIA PROJECT , 2004 .

[5]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[6]  Bo Hu,et al.  Ontology-based medical image annotation with description logics , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[7]  Bo Hu,et al.  Multimedia Distributed Knowledge Management in MIAKT , 2004, SemAnnot@ISWC.

[8]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[9]  Stefan M. Rüger,et al.  Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation , 2005, CIVR.

[10]  Marcel Worring,et al.  Adding Spatial Semantics to Image Annotations , 2004, LSTKM@EKAW.

[11]  Corinne Jörgensen Image Retrieval: Theory and Research , 2003 .

[12]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[13]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[14]  Sara Shatford,et al.  Analyzing the Subject of a Picture: A Theoretical Approach , 1986 .

[15]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[16]  Bob J. Wielinga,et al.  Ontology-Based Photo Annotation , 2001, IEEE Intell. Syst..

[17]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Chrisa Tsinaraki,et al.  Coupling OWL with MPEG-7 and TV-Anytime for Domain-specific Multimedia Information Integration and Retrieval , 2004, RIAO.

[19]  Jonathon S. Hare,et al.  On Image Retrieval Using Salient Regions with Vector-Spaces and Latent Semantics , 2005, CIVR.

[20]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[21]  R. Manmatha,et al.  Using Maximum Entropy for Automatic Image Annotation , 2004, CIVR.

[22]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[23]  Jonathon S. Hare,et al.  Saliency-based Models of Image Content and their Application to Auto-Annotation by Semantic Propagation , 2005 .

[24]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[25]  Jonathon S. Hare,et al.  Salient Regions for Query by Image Content , 2004, CIVR.

[26]  Marcel Worring,et al.  Classification of user image descriptions , 2004, Int. J. Hum. Comput. Stud..

[27]  Peter G. B. Enser,et al.  Analysis of user need in image archives , 1997, J. Inf. Sci..

[28]  A. T. Schreiber,et al.  Semantic Annotation of Image Collections , 2003 .

[29]  Paul H. Lewis,et al.  New Ways to Search, Navigate and Use Multimedia Museum Collections over the Web , 2005 .

[30]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[31]  Paul H. Lewis,et al.  Surveying the Reality of Semantic Image Retrieval , 2005, VISUAL.

[32]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[33]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[34]  Wei-Ying Ma,et al.  Image and Video Retrieval , 2003, Lecture Notes in Computer Science.

[35]  John R. Smith,et al.  Modal Keywords, Ontologies, and Reasoning for Video Understanding , 2003, CIVR.

[36]  Antonio Torralba,et al.  Scene-Centered Description from Spatial Envelope Properties , 2002, Biologically Motivated Computer Vision.

[37]  Naphtali Rishe,et al.  Content-based image retrieval , 1995, Multimedia Tools and Applications.

[38]  Paul H. Lewis,et al.  Automatic Annotation of Images from the Practitioner Perspective , 2005, CIVR.

[39]  Jonathon S. Hare Saliency for image description and retrieval , 2006 .

[40]  Jane Hunter,et al.  Adding Multimedia to the Semantic Web: Building an MPEG-7 ontology , 2001, SWWS.