Similarity indexing with the SS-tree

Efficient indexing of high dimensional feature vectors is important to allow visual information systems and a number other applications to scale up to large databases. We define this problem as "similarity indexing" and describe the fundamental types of "similarity queries" that we believe should be supported. We also propose a new dynamic structure for similarity indexing called the similarity search tree or SS-tree. In nearly every test we performed on high dimensional data, we found that this structure performed better than the R*-tree. Our tests also show that the SS-tree is much better suited for approximate queries than the R*-tree.

[1]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[2]  Gerard Salton,et al.  Generation and search of clustered files , 1978, TODS.

[3]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[4]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[5]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[6]  Christopher K. Riesbeck,et al.  Inside Case-Based Reasoning , 1989 .

[7]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[8]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[9]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[10]  Terry E. Weymouth,et al.  Semantic Queries with Pictures: The VIMSYS Model , 1991, VLDB.

[11]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[12]  Ramesh C. Jain,et al.  A Visual Information Management System for the Interactive Retrieval of Faces , 1993, IEEE Trans. Knowl. Data Eng..

[13]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[14]  Sunil Arya,et al.  Algorithms for fast vector quantization , 1993, [Proceedings] DCC `93: Data Compression Conference.

[15]  Andreas Henrich A Distance Scan Algorithm for Spatial Access Structures , 1994, ACM-GIS.

[16]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[17]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[18]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[19]  Dariu M. Gavrila,et al.  R-Tree Index Optimization , 1994 .

[20]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[21]  HongJiang Zhang,et al.  Scheme for visual feature-based image indexing , 1995, Electronic Imaging.

[22]  Vijay V. Raghavan,et al.  Content-Based Image Retrieval Systems - Guest Editors' Introduction , 1995, Computer.

[23]  Alex Pentland,et al.  Photobook: tools for content-based manipulation of image databases , 1994, Other Conferences.

[24]  Hans-Peter Kriegel,et al.  A Database Interface for Clustering in Large Spatial Databases , 1995, KDD.

[25]  Douglas W. Oard,et al.  A survey of information retrieval and filtering methods , 1995 .

[26]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Ramesh C. Jain,et al.  Similarity indexing: algorithms and performance , 1996, Electronic Imaging.

[28]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[29]  Ramesh Jain,et al.  Infoscopes: Multimedia Information Systems , 1996 .

[30]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[31]  Stefan Berchtold,et al.  Fast Searching for Partial Similarity in Polygon Databases , 1997, VLDB 1997.

[32]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[33]  Hans-Peter Kriegel,et al.  The pyramid-technique: towards breaking the curse of dimensionality , 1998, SIGMOD '98.