Learning the Semantics in Image Retrieval - A Natural Language Processing Approach

Learning the semantics of image retrieval using both text and visual information is a challenging research issue in content-based image retrieval systems. In this paper, we present a statistical natural language processing model for image retrieval, which integrates semantic information provided by WordNet, an online lexical reference system, and low-level visual features. In our system, the semantic hierarchy of word senses from WordNet is used to strengthen the association between images and the textual description of a concept. A statistical keyword selection algorithm is followed to choose the most representative keywords to annotate those images of the concept. We test our model on a landscape image database with 10 different concepts. Our experimental results show that our approach could greatly improve the retrieval accuracy. The results also demonstrate the high potential of our approach in building ontologies of image databases.

[1]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Alex Pentland,et al.  Photobook: tools for content-based manipulation of image databases , 1994, Electronic Imaging.

[3]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[4]  Anil K. Jain,et al.  Image retrieval using color and shape , 1996, Pattern Recognit..

[5]  Thiago L. V. L. Santos,et al.  Providing Context to Web Searches: The Use of Ontologies to Enhance Search Engine's Accuracy , 1998, J. Braz. Comput. Soc..

[6]  Anil K. Jain,et al.  On image classification: city vs. landscape , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[7]  Alberto Del Bimbo,et al.  Visual information retrieval , 1999 .

[8]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9]  Michael S. Lew Next-Generation Web Searches for Visual Content , 2000, Computer.

[10]  William I. Grosky,et al.  Negotiating the semantic gap: from feature maps to semantic landscapes , 2001, Pattern Recognit..

[11]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[12]  Bob J. Wielinga,et al.  Ontology-Based Photo Annotation , 2001, IEEE Intell. Syst..

[13]  David A. Forsyth,et al.  Clustering art , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[14]  Milind R. Naphade,et al.  Extracting semantics from audio-visual content: the final frontier in multimedia retrieval , 2002, IEEE Trans. Neural Networks.

[15]  Malcolm Slaney,et al.  Semantic-audio retrieval , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  William I. Grosky,et al.  Narrowing the semantic gap - improved text-based web document retrieval using visual features , 2002, IEEE Trans. Multim..

[17]  Anil K. Jain,et al.  Automatic image orientation detection , 2002, IEEE Trans. Image Process..

[18]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..