Image similarity search with compact data structures

The recent theoretical advances on compact data structures (also called "sketches") have raised the question of whether they can effectively be applied to content-based image retrieval systems. The main challenge is to derive an algorithm that achieves high-quality similarity searches while using compact metadata. This paper proposes a new similarity search method consisting of three parts. The first is a new region feature representation with weighted $=<i></i><inf>1</inf> distance function, and EMD* match, an improved EMD match, to compute image similarity. The second is a thresholding and transformation algorithm to convert feature vectors into very compact data structures. The third is an EMD embedding based filtering method to speed up the query process. We have implemented a prototype system with the proposed method and performed experiments with a 10,000 image database. Our results show that the proposed method can achieve more effective similarity searches than previous approaches with metadata 3 to 72 times more compact than previous systems. The experiments also show that our EMD embedding based filtering technique can speed up the query process by a factor of 5 or more with little loss in query effectiveness.

[1]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[3]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[4]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[5]  Hayit Greenspan,et al.  Context-dependent segmentation and matching in image databases , 2004, Comput. Vis. Image Underst..

[6]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[7]  Bo Zhang,et al.  An effective region-based image retrieval framework , 2002, MULTIMEDIA '02.

[8]  Mark H. Kryder Future Magnetic Recording Technologies , 2002 .

[9]  Ilaria Bartolini,et al.  Windsurf: region-based image retrieval using wavelets , 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99.

[10]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[11]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[12]  Jitendra Malik,et al.  Blobworld: A System for Region-Based Image Indexing and Retrieval , 1999, VISUAL.

[13]  Kyuseok Shim,et al.  WALRUS: a similarity retrieval algorithm for image databases , 1999, IEEE Transactions on Knowledge and Data Engineering.

[14]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[15]  Remco C. Veltkamp,et al.  Content-based image retrieval systems: A survey , 2000 .

[16]  Ronitt Rubinfeld,et al.  A sublinear algorithm for weakly approximating edit distance , 2003, STOC '03.

[17]  Wei-Ying Ma,et al.  Benchmarking of image features for content-based retrieval , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[18]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[19]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Kyuseok Shim,et al.  WALRUS: A Similarity Retrieval Algorithm for Image Databases , 2004, IEEE Trans. Knowl. Data Eng..

[22]  Ilaria Bartolini,et al.  A sound algorithm for region-based image retrieval using an index , 2000, Proceedings 11th International Workshop on Database and Expert Systems Applications.

[23]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[24]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[25]  Jitendra Malik,et al.  Learning to Detect Natural Image Boundaries Using Brightness and Texture , 2002, NIPS.

[26]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[27]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Multimedia Systems.