Incremental Similarity Search in Multimedia Databases

Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some distance measure d, usually a distance metric. Existing methods for handling similarity search in this setting fall into one of two classes. The first is based on mapping to a low-dimensional vector space (making use of data structures such as the R-tree), while the second directly indexes the objects based on distances (making use of data structures such as the M-tree). We introduce a general framework for performing search based on distances, and present an incremental nearest neighbor algorithm that operates on an arbitrary “search hierarchy”. We show how this framework can be applied in both classes of similarity search methods, by defining a suitable search hierarchy for a number of different indexing structures. Armed with an appropriate search hierarchy, our algorithm thus performs incremental similarity search, wherein the result objects are reported one by one in order of similarity to a query object, with as little effort as possible expended to produce each new result object. This is especially important in interactive database applications, as it makes it possible to display partial query results early. The incremental aspect also provides significant benefits in situations when the number of desired neighbors is unknown in advance. Furthermore, our algorithm is at least as efficient as existing k-nearest neighbor algorithms, in terms of the number of distance computations and index node accesses. In fact, provided that the search hierarchy is properly defined, our algorithm can be shown to be optimal in the sense of performing as few distance computations and node accesses as possible, given the available index structure. An experimental study confirms our reasoning, and suggests that the overhead due to the incremental aspect is modest, especially if distance computations are expensive and/or the indexing structure or data objects are stored on disk. This work was supported in part by the National Science Foundation under Grant IRI-97-12715.

[1]  R. Faure,et al.  Introduction to operations research , 1968 .

[2]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[3]  Walter A. Burkhard,et al.  Some approaches to best-match file searching , 1973, Commun. ACM.

[4]  Ronald L. Rivest,et al.  On the Optimality of Elia's Algorithm for Performing Best-Match Searches , 1974, IFIP Congress.

[5]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[6]  Forest Baskett,et al.  An Algorithm for Finding Nearest Neighbors , 1975, IEEE Transactions on Computers.

[7]  Marvin B. Shapiro The choice of reference points in best-match file searching , 1977, CACM.

[8]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[9]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[10]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[11]  Hanan Samet,et al.  A consistent hierarchical representation for vector data , 1986, SIGGRAPH.

[12]  Enrique Vidal-Ruiz,et al.  An algorithm for finding nearest neighbours in (approximately) constant average time , 1986, Pattern Recognit. Lett..

[13]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[14]  Forrest W. Young Multidimensional Scaling: History, Theory, and Applications , 1987 .

[15]  Jack A. Orenstein Redundancy in spatial databases , 1989, SIGMOD '89.

[16]  Dennis Shasha,et al.  Query Processing for Distance Metrics , 1990, VLDB.

[17]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[18]  Kuldip K. Paliwal,et al.  An efficient approximation-elimination algorithm for fast nearest-neighbour search based on a spherical distance coordinate formulation , 1992, Pattern Recognit. Lett..

[19]  E. Vidal,et al.  An algorithm for finding nearest neighbours in constant average time with a linear space complexity , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[20]  Hartmut Noltemeier,et al.  A Data Structure for Representing and Efficient Querying Large Scenes of Geometric Objects: MB* Trees , 1993, Geometric Modelling.

[21]  Peter van Oosterom,et al.  Reactive Data Structures for Geographic Information Systems , 1993 .

[22]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[23]  Enrique Vidal,et al.  New formulation and improvements of the nearest-neighbour approximating and eliminating search algorithm (AESA) , 1994, Pattern Recognit. Lett..

[24]  Andreas Henrich A Distance Scan Algorithm for Spatial Access Structures , 1994, ACM-GIS.

[25]  Ricardo A. Baeza-Yates,et al.  Proximity Matching Using Fixed-Queries Trees , 1994, CPM.

[26]  Tzi-cker Chiueh,et al.  Content-Based Image Indexing , 1994, VLDB.

[27]  Luisa Micó,et al.  A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements , 1994, Pattern Recognit. Lett..

[28]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[29]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[30]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[31]  Nathan Linial,et al.  The geometry of graphs and some of its algorithmic applications , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[32]  Juan Miguel Vilar,et al.  Reducing the Overhead of the AESA Metric-Space Nearest Neighbour Searching Algorithm , 1995, Inf. Process. Lett..

[33]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.

[34]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[35]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[36]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[39]  Luisa Micó,et al.  A fast branch & bound nearest neighbour classifier in metric spaces , 1996, Pattern Recognit. Lett..

[40]  Christos Faloutsos,et al.  Fast Nearest Neighbor Search in Medical Image Databases , 1996, VLDB.

[41]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[42]  Z. Meral Özsoyoglu,et al.  Distance-based indexing for high-dimensional metric spaces , 1997, SIGMOD '97.

[43]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[44]  Pavel Zezula,et al.  A cost model for similarity queries in metric spaces , 1998, PODS '98.

[45]  Sharad Mehrotra,et al.  High dimensional feature indexing using hybrid trees , 1998, ICDE 1998.

[46]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[47]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[48]  Ambuj K. Singh,et al.  Dimensionality reduction for similarity searching in dynamic databases , 1998, SIGMOD '98.

[49]  Peter Yianilos,et al.  Excluded middle vantage point forests for nearest neighbor search , 1998 .

[50]  Andreas Henrich,et al.  The LSD/sup h/-tree: an access structure for feature vectors , 1998, Proceedings 14th International Conference on Data Engineering.

[51]  Hans-Peter Kriegel,et al.  3D Shape Histograms for Similarity Search and Classification in Spatial Databases , 1999, SSD.

[52]  Michael T. Goodrich,et al.  Balanced aspect ratio trees: combining the advantages of k-d trees and octrees , 1999, SODA '99.

[53]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[54]  H. Gabriela,et al.  Cluster-preserving Embedding of Proteins , 1999 .

[55]  Z. Meral Özsoyoglu,et al.  Indexing large metric spaces for similarity search queries , 1999, TODS.

[56]  Gonzalo Navarro,et al.  Overcoming the Curse of Dimensionality , 1999 .

[57]  Sharad Mehrotra,et al.  The hybrid tree: an index structure for high dimensional feature spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[58]  Kaizhong Zhang,et al.  An Index Structure for Data Mining and Clustering , 2000, Knowledge and Information Systems.

[59]  Divyakant Agrawal,et al.  Vector approximation based indexing for non-uniform high dimensional data sets , 2000, CIKM '00.

[60]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[61]  John C. Dalton,et al.  Hierarchical browsing and search of large image databases , 2000, IEEE Trans. Image Process..

[62]  H. Samet Contractive Embedding Methods for Similarity Searching in Metric Spaces , 2000 .

[63]  Ada Wai-Chee Fu,et al.  Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances , 2000, The VLDB Journal.

[64]  Christos Faloutsos,et al.  Slim-Trees: High Performance Metric Trees Minimizing Overlap Between Nodes , 2000, EDBT.

[65]  Marco Patella,et al.  Bulk Loading the M-tree , 2001 .

[66]  David M. Mount,et al.  An Empirical Study of a New Approach to Nearest Neighbor Searching , 2001, ALENEX.

[67]  David M. Mount,et al.  The Analysis of a Probabilistic Approach to Nearest Neighbor Searching , 2001, WADS.

[68]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[69]  Gonzalo Navarro Searching in metric spaces by spatial approximation , 2002, The VLDB Journal.