Similarity indexing: algorithms and performance

Efficient indexing support is essential to allow content-based image and video databases using similarity-based retrieval to scale to large databases (tens of thousands up to millions of images). In this paper, we take an in depth look at this problem. One of the major difficulties in solving this problem is the high dimension (6-100) of the feature vectors that are used to represent objects. We provide an overview of the work in computational geometry on this problem and highlight the results we found are most useful in practice, including the use of approximate nearest neighbor algorithms. We also present a variant of the optimized k-d tree we call the VAM k-d tree, and provide algorithms to create an optimized R-tree we call the VAMSplit R-tree. We found that the VAMSplit R-tree provided better overall performance than all competing structures we tested for main memory and secondary memory applications. We observed large improvements in performance relative to the R*-tree and SS-tree in secondary memory applications, and modest improvements relative to optimized k-d tree variants.

[1]  Sergei Bespamyatnikh,et al.  An Optimal Algorithm for Closest-Pair Maintenance , 1998, Discret. Comput. Geom..

[2]  B. S. Manjunath Image Browsing in the Alexandria Digital Library (ADL) Project , 1995, D Lib Mag..

[3]  Alex Pentland,et al.  Photobook: tools for content-based manipulation of image databases , 1994, Other Conferences.

[4]  David R. Musser,et al.  STL tutorial and reference guide - C++ programming with the standard template library , 1996, Addison-Wesley professional computing series.

[5]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[6]  J. Ian Munro,et al.  Average case selection , 1984, STOC '84.

[7]  Sunil Arya,et al.  Approximate range searching , 1995, SCG '95.

[8]  S. Rao Kosaraju,et al.  Algorithms for dynamic closest pair and n-body potential fields , 1995, SODA '95.

[9]  Robert M. Gray,et al.  An Improvement of the Minimum Distortion Encoding Algorithm for Vector Quantization , 1985, IEEE Trans. Commun..

[10]  Alexander A. Stepanov,et al.  Algorithm‐oriented generic libraries , 1994, Softw. Pract. Exp..

[11]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[12]  Ronald L. Rivest,et al.  On the Optimality of Elia's Algorithm for Performing Best-Match Searches , 1974, IFIP Congress.

[13]  Andreas Henrich A Distance Scan Algorithm for Spatial Access Structures , 1994, ACM-GIS.

[14]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[15]  Kuldip K. Paliwal,et al.  Fast K-dimensional tree algorithms for nearest neighbor search with application to vector quantization encoding , 1992, IEEE Trans. Signal Process..

[16]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[17]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[18]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[19]  Michael Stonebraker,et al.  Chabot: Retrieval from a Relational Database of Images , 1995, Computer.

[20]  Tony Hoare,et al.  Algorithm 63‚ Partition; Algorithm 64‚ Quicksort; Algorithm 65‚ Find , 1961 .

[21]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[22]  Peter Widmayer,et al.  The LSD tree: spatial access to multidimensional and non-point objects , 1989, VLDB 1989.

[23]  Manfred Schroeder,et al.  Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise , 1992 .

[24]  Kenneth L. Clarkson,et al.  An algorithm for approximate closest-point queries , 1994, SCG '94.

[25]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[26]  Sunil Arya,et al.  Accounting for boundary effects in nearest-neighbor searching , 1996, Discret. Comput. Geom..

[27]  Jon Louis Bentley,et al.  K-d trees for semidynamic point sets , 1990, SCG '90.

[28]  John G. Cleary,et al.  Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean Space , 1979, TOMS.

[29]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[30]  S. Arya Nearest neighbor searching and applications , 1996 .

[31]  B. S. Manjunath,et al.  Image indexing using a texture dictionary , 1995, Other Conferences.

[32]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[33]  Dariu M. Gavrila,et al.  R-Tree Index Optimization , 1994 .