Indexing and Integrating Multiple Features for WWW Images

In this paper, we present a novel indexing technique called Multi-scale Similarity Indexing (MSI) to index image's multi-features into a single one-dimensional structure. Both for text and visual feature spaces, the similarity between a point and a local partition's center in individual space is used as the indexing key, where similarity values in different features are distinguished by different scale. Then a single indexing tree can be built on these keys. Based on the property that relevant images have similar similarity values from the center of the same local partition in any feature space, certain number of irrelevant images can be fast pruned based on the triangle inequity on indexing keys. To remove the “dimensionality curse” existing in high dimensional structure, we propose a new technique called Local Bit Stream (LBS). LBS transforms image's text and visual feature representations into simple, uniform and effective bit stream (BS) representations based on local partition's center. Such BS representations are small in size and fast for comparison since only bit operation are involved. By comparing common bits existing in two BSs, most of irrelevant images can be immediately filtered. To effectively integrate multi-features, we also investigated the following evidence combination techniques—Certainty Factor, Dempster Shafer Theory, Compound Probability, and Linear Combination. Our extensive experiment showed that single one-dimensional index on multi-features improves multi-indices on multi-features greatly. Our LBS method outperforms sequential scan on high dimensional space by an order of magnitude. And Certainty Factor and Dempster Shafer Theory perform best in combining multiple similarities from corresponding multiple features.

[1]  Wei-Ying Ma,et al.  Hierarchical clustering of WWW image search results using visual, textual and link information , 2004, MULTIMEDIA '04.

[2]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[3]  Heng Tao Shen,et al.  Indexing Text and Visual Features for WWW Images , 2005, APWeb.

[4]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[5]  Beng Chin Ooi,et al.  Indexing the edges—a simple and yet efficient approach to high-dimensional indexing , 2000, PODS.

[6]  Clement T. Yu,et al.  Evaluating strategies and systems for content based indexing of person images on the Web , 2000, ACM Multimedia.

[7]  Beng Chin Ooi,et al.  Indexing the Distance: An Efficient Method to KNN Processing , 2001, VLDB.

[8]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[9]  S. Sclaroff,et al.  ImageRover: a content-based image browser for the World Wide Web , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[10]  Beng Chin Ooi,et al.  Giving meanings to WWW images , 2000, MM 2000.

[11]  Daniel A. Keim,et al.  Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering , 1999, VLDB.

[12]  Mingjing Li,et al.  iFind: a web image search engine , 2001, SIGIR '01.

[13]  Wei-Ying Ma,et al.  Improving pseudo-relevance feedback in web information retrieval using web page segmentation , 2003, WWW '03.

[14]  Shih-Fu Chang,et al.  Image and video search engine for the World Wide Web , 1997, Electronic Imaging.

[15]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[16]  Sougata Mukherjea,et al.  AMORE: a world-wide web image retrieval engine , 1999, CHI Extended Abstracts.

[17]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[18]  James Ze Wang,et al.  Content-based image indexing and searching using Daubechies' wavelets , 1998, International Journal on Digital Libraries.

[19]  Edward H. Shortliffe,et al.  Computer-based medical consultations, MYCIN , 1976 .

[20]  John P. Oakley,et al.  Storage and Retrieval for Image and Video Databases , 1993 .

[21]  Beng Chin Ooi,et al.  Towards effective indexing for very large video sequence database , 2005, SIGMOD '05.

[22]  Sharad Mehrotra,et al.  The hybrid tree: an index structure for high dimensional feature spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[23]  Beng Chin Ooi,et al.  An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[24]  E. Shortliffe Computer-based medical consultations: mycin (elsevier north holland , 1976 .

[25]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[26]  Anne H. H. Ngu,et al.  Combining multi-visual features for efficient indexing in a large image database , 2001, The VLDB Journal.

[27]  Colin C. Venters,et al.  A Review of Content-Based Image Retrieval Systems , 1982 .

[28]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..