Bridging the Semantic Gap using Human Vision System Inspired Features

In the last decade, digital imaging has experienced a worldwide revolution of growth in both the number of users and the range of applications. The amount of digital image content produced on a daily basis is still increasing drastically. As from the very beginning of photography, those who took pictures tried to capture as much information as possible about the photograph and in today's digital age, the need for appending metadata is even bigger. However, it is obvious that manually annotating images is a cumbersome, time consuming and expensive task for large image databases, and it is often subjective, contextsensitive and incomplete. Furthermore, it is difficult for the traditional text-based methods to support a variety of task-dependent queries solely relying on textual metadata since visual information is a more capable medium of conveying ideas and is more closely related to human perception of the real world. The dynamic image characteristics require sophisticated methodologies for data visualization, indexing and similarity management and, as a result, have attracted significant research efforts in providing tools for contentbased retrieval of visual data. Content-based image retrieval uses the visual contents of an image such as color, shape, texture, and spatial layout to represent and index the image. Early content-based image retrieval systems were based on the search for the best match to a user-provided query image or sketch (Flickner et al., 1995; Mehrotra et al., 1997; Laaksonen et al., 2002). Such systems decompose each image into a number of low-level visual features (e.g., color histograms, edge information) and the retrieval process is formulated as the search for the best match to the feature vector(s) extracted from a query image. However, it was quickly realized that the design of a fully functional retrieval system would require support for semantic queries (Picard, 1995). The basic idea is to automatically associate semantic keywords with each image by building models of visual appearance of the semantic concepts of interest. However, the critical point in the advancement of contentbased image retrieval is the semantic gap. The semantic gap is the major discrepancy in computer vision: the user wants to retrieve images on a semantic level, but the image characterizations can only provide a low-level similarity. As a result, describing high-level semantic concepts with low-level visual features is a challenging task. The first efforts targeted the extraction of specific semantics under the framework of binary classification, such as indoor versus outdoor (Szummer & Picard, 1998), and city versus landscape 16

[1]  Peter Lambert,et al.  Unsupervised texture segmentation and labeling using biologically inspired features , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[2]  Nicolai Petkov,et al.  Nonlinear operator for oriented texture , 1999, IEEE Trans. Image Process..

[3]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[4]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Anil K. Jain,et al.  On image classification: city vs. landscape , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[6]  Pasi Koikkalainen,et al.  Self-organizing hierarchical feature maps , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[7]  Thomas S. Huang,et al.  Supporting content-based queries over images in MARS , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[8]  Tieniu Tan,et al.  Font Recognition Based on Global Texture Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Myron Flickner,et al.  Query by Image and Video Content , 1995 .

[10]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[11]  Jitendra Malik,et al.  When is scene recognition just texture recognition , 2010 .

[12]  Tieniu Tan,et al.  Invariant texture segmentation via circular Gabor filters , 2002, Object recognition supported by user interaction for service robots.

[13]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[14]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[15]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[16]  Lei Zhu,et al.  Keyblock: an approach for content-based image retrieval , 2000, ACM Multimedia.

[17]  Erkki Oja,et al.  PicSOM-self-organizing image retrieval with MPEG-7 content descriptors , 2002, IEEE Trans. Neural Networks.

[18]  Ames StreetCambridge Digital Libraries: Meeting Place for High-level and Low-level Vision , 2007 .

[19]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[20]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[21]  Aidong Zhang,et al.  Semantics Retrieval by Content and Context of Image Regions , 2002 .

[22]  Markus Turtinen,et al.  Visual Training and Classification of Textured Scene Images , 2003 .

[23]  Wilson S. Geisler,et al.  Multichannel Texture Analysis Using Localized Spatial Filters , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  T. Poggio,et al.  How Visual Cortex Recognizes Objects: The Tale of the Standard Model , 2002 .

[25]  Nicolai Petkov,et al.  Grating cell operator features for oriented texture segmentation , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[26]  David A. Clausi,et al.  Designing Gabor filters for optimal texture separability , 2000, Pattern Recognit..