Efficient k-NN search on vertically decomposed data

Applications like multimedia retrieval require efficient support for similarity search on large data collections. Yet, nearest neighbor search is a difficult problem in high dimensional spaces, rendering efficient applications hard to realize: index structures degrade rapidly with increasing dimensionality, while sequential search is not an attractive solution for repositories with millions of objects. This paper approaches the problem from a different angle. A solution is sought in an unconventional storage scheme, that opens up a new range of techniques for processing k-NN queries, especially suited for high dimensional spaces. The suggested (physical) database design accommodates well a novel variant of branch-and-bound search, that reduces the high dimensional space quickly to a small candidate set. The paper provides insight in applying this idea to k-NN search using two similarity metrics commonly encountered in image database applications, and discusses techniques for its implementation in relational database systems. The effectiveness of the proposed method is evaluated empirically on both real and synthetic data sets, reporting the significant improvements in response time yielded.

[1]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[2]  Christos Faloutsos,et al.  Fast Nearest Neighbor Search in Medical Image Databases , 1996, VLDB.

[3]  Martin L. Kersten,et al.  MIL primitives for querying a fragmented world , 1999, The VLDB Journal.

[4]  O'NeilPatrick,et al.  Improved query performance with variant indexes , 1997 .

[5]  Shih-Fu Chang,et al.  Tools and techniques for color image retrieval , 1996, Electronic Imaging.

[6]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[7]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[8]  Christos Faloutsos,et al.  MindReader: Querying Databases Through Multiple Examples , 1998, VLDB.

[9]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[10]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[11]  Rajiv Chopra,et al.  Indexing images in Oracle8i , 2000, SIGMOD '00.

[12]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[13]  A. Guttmma,et al.  R-trees: a dynamic index structure for spatial searching , 1984 .

[14]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[15]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[16]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[17]  Thomas S. Huang,et al.  Supporting similarity queries in MARS , 1997, MULTIMEDIA '97.

[18]  Ronald Fagin,et al.  Fuzzy queries in multimedia database systems , 1998, PODS '98.

[19]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[20]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.