A structured learning framework for content-based image indexing and visual query

Abstract.Nonspecific images in a broad domain remain a challenge for content-based image retrieval. As a typical example, consumer photos exhibit highly varied content, diverse resolutions, and inconsistent quality. The objects are usually ill-posed, occluded, and cluttered with poor lighting, focus, and exposure. Traditional image retrieval approaches face many obstacles such as semantic description of images, robust semantic object segmentation, small sampling problem, semantic gaps between low-level features and high-level semantics, etc.To manage the high diversity of images in a broad domain, we propose a structured learning framework to systematically design domain-relevant visual semantics, known as semantic support regions, to support index and query in a content-based image retrieval system. Semantic support regions are segmentation-free image regions that exhibit semantic meanings and that can be learned statistically to span a new indexing space. They are detected from image content, reconciled across multiple resolutions, and aggregated spatially to form local semantic histograms. The resulting compact and abstract representation can support both similarity-based query and compositional visual query efficiently. The query by spatial icons (QBSI) formulation is a unique visual query language to explicitly specify visual icons and spatial extents in a Boolean expression.For empirical evaluation, we perform the learning and indexing processes of 26 semantic support regions over 2400 heterogeneous consumer photos from a single family using Support Vector Machines. We report a $27\%$ improvement in average precision over a very high dimension feature-based approach on 24 semantic queries based on multiple examples and pooled ground truths. Last but not least, we demonstrate the usefulness of the visual query language with 15 QBSI queries that have attained high precision values at top retrieved images on the 2400 consumer images.

[1]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Paul A. Viola,et al.  Boosting Image Retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[3]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[4]  James Ze Wang,et al.  IRM: integrated region matching for image retrieval , 2000, ACM Multimedia.

[5]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[6]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[7]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[8]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Lei Zhu,et al.  A method for measuring the complexity of image databases , 2002, IEEE Trans. Multim..

[11]  Michael S. Lew Next-Generation Web Searches for Visual Content , 2000, Computer.

[12]  Thomas S. Huang,et al.  Content-based image retrieval with relevance feedback in MARS , 1997, Proceedings of International Conference on Image Processing.

[13]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[15]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[16]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[17]  Marco La Cascia,et al.  Image Digestion and Relevance Feedback in the ImageRover WWW Search Engine , 1997 .

[18]  Aleix M. Martínez,et al.  A New Approach to Object-Related Image Retrieval , 2000, J. Vis. Lang. Comput..

[19]  Alberto Del Bimbo,et al.  Visual Image Retrieval by Elastic Matching of User Sketches , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Ben Bradshaw,et al.  Semantic based image retrieval: a probabilistic approach , 2000, ACM Multimedia.

[21]  Peter G. B. Enser,et al.  Analysis of user need in image archives , 1997, J. Inf. Sci..

[22]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Joo-Hwee Lim Learnable visual keywords for image classification , 1999, DL '99.

[24]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[25]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[26]  Shih-Fu Chang,et al.  Visually Searching the Web for Content , 1997, IEEE Multim..

[27]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Amarnath Gupta,et al.  Virage image search engine: an open framework for image management , 1996, Electronic Imaging.

[30]  Lei Zhu,et al.  Theory of keyblock-based image retrieval , 2002, TOIS.

[31]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[32]  Joo-Hwee Lim Building Visual Vocabulary for Image Indexation and Query Formulation , 2001, Pattern Analysis & Applications.

[33]  Martial Hebert,et al.  Probabilistic Classification of Image Regions using an Observation-Constrained Generative Approach , 2002 .

[34]  Joo-Hwee Lim Learning visual keywords for content-based retrieval , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[35]  William I. Grosky,et al.  Image Indexing and Retrieval Using Object-based Point Feature Maps , 2000, J. Vis. Lang. Comput..

[36]  Henning Biermann,et al.  Regions-of-Interest and Spatial Layout for Content-Based Image Retrieval , 2001, Multimedia Tools and Applications.

[37]  Ingemar J. Cox,et al.  The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..

[38]  PoggioTomaso,et al.  Example-Based Learning for View-Based Human Face Detection , 1998 .

[39]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[40]  Mohamed Daoudi,et al.  Visual Image Retrieval by Multiscale Description of User Sketches , 2000, J. Vis. Lang. Comput..

[41]  Luigi Cinque,et al.  Retrieval of Images using Rich-region Descriptions , 2000, J. Vis. Lang. Comput..

[42]  Aidong Zhang,et al.  Analyzing scenery images by monotonic tree , 2003, Multimedia Systems.

[43]  Joo-Hwee Lim Explicit query formulation with visual keywords , 2000, ACM Multimedia.

[44]  Alex Pentland,et al.  Photobook: tools for content-based manipulation of image databases , 1994, Other Conferences.

[45]  George J. Klir,et al.  Fuzzy sets, uncertainty and information , 1988 .

[46]  Qi Tian,et al.  Discriminant-EM algorithm with application to image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[47]  Simone Santini,et al.  Emergent Semantics through Interaction in Image Databases , 2001, IEEE Trans. Knowl. Data Eng..

[48]  Joo-Hwee Lim,et al.  Visual Keywords: from Text Retrieval to Multimedia Retrieval , 2000 .