Array-index: a plug&search K nearest neighbors method for high-dimensional data

Previous algorithms of data partitioning methods (DPMs) to find the exact K-nearest neighbors (KNN) at high dimensions are outperformed by a linear scan method [J.M. Kleinberg, Two algorithms for nearest neighbor search in high dimensions, 29th ACM Symposium on Theory of computing, 1997; R. Weber, H.-J. Schek, S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces, in: Proc. of the 24th VLDB, USA, 1998]. In this paper, we present a "plug& search" method to greatly speed up the exact KNN search of existing DPMs. The idea is to linearize the data partitions produced by a DPM, rather than the points themselves, into a one-dimensional array-index, that is simple, compact and fast. Unlike most DPMs that support KNN search, which require storage space linear, or exponential [J.M. Kleinberg, Two algorithms for nearest neighbor search in high dimensions, 29th ACM Symposium on Theory of computing, 1997; M. Hagedoom, Nearest neighbors can be found efficiently if the dimension is small relative to the input size, ICDT 2003], in dimensions, the array-index requires a storage space that is linear in the number of mapped partitions.

[1]  David Salesin,et al.  Fast multiresolution image querying , 1995, SIGGRAPH.

[2]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[3]  Xiaoming Zhu,et al.  An efficient indexing method for nearest neighbor searches in high-dirnensional image databases , 2002, IEEE Trans. Multim..

[4]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[5]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[6]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[7]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[8]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[9]  Martin L. Kersten,et al.  Efficient k-NN search on vertically decomposed data , 2002, SIGMOD '02.

[10]  Beng Chin Ooi,et al.  Indexing the Distance: An Efficient Method to KNN Processing , 2001, VLDB.

[11]  Elias Pampalk,et al.  EMPIRICAL EVALUATION OF CLUSTERING ALGORITHMS , 2000 .

[12]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[13]  Teuvo Kohonen,et al.  Self-Organizing Maps, Second Edition , 1997, Springer Series in Information Sciences.

[14]  Michiel Hagedoorn Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size , 2003, ICDT.

[15]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[16]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[17]  Charu C. Aggarwal,et al.  Towards meaningful high-dimensional nearest neighbor search by human-computer interaction , 2002, Proceedings 18th International Conference on Data Engineering.

[18]  Jack A. Orenstein Spatial query processing in an object-oriented database system , 1986, SIGMOD '86.

[19]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[20]  Jon Louis Bentley,et al.  Multidimensional Binary Search Trees in Database Applications , 1979, IEEE Transactions on Software Engineering.

[21]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[22]  Christos Faloutsos,et al.  How to improve the pruning ability of dynamic metric access methods , 2002, CIKM '02.

[23]  Christos Faloutsos,et al.  Fast and Effective Retrieval of Medical Tumor Shapes , 1998, IEEE Trans. Knowl. Data Eng..

[24]  Nimrod Megiddo,et al.  Fast indexing method for multidimensional nearest-neighbor search , 1998, Electronic Imaging.

[25]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[26]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[27]  Christos Faloutsos,et al.  Searching Multimedia Databases by Content , 1996, Advances in Database Systems.

[28]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[29]  Sharad Mehrotra,et al.  Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces , 2000, VLDB.

[30]  Zaher Al Aghbari,et al.  Fast k-NN Image Search with Self-Organizing Maps , 2002, CIVR.

[31]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[32]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[33]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[34]  Beng Chin Ooi,et al.  An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).