Filter ranking in high-dimensional space

High-dimensional index structures are a means to accelerate database query processing in high-dimensional data, like multimedia feature vectors. A particular interest in many application scenarios is to rank data items with respect to a certain distance function and, thus, identifying the nearest neighbor(s) of a query item.In this paper, we propose a novel ranking algorithm that (1) operates on arbitrary high-dimensional filter indexes, like the VA-file, the VA+-file, the LPC-file, or the AV-method. Our ranking algorithm (2) exhibits a nearly balanced I/O load to retrieve subsequent items. Finally, it (3) strictly obeys a predefined main memory threshold and even (4) terminates successfully when memory restrictions are very tight.

[1]  Wolf-Tilo Balke,et al.  Towards efficient multi-feature queries in heterogeneous environments , 2001, Proceedings International Conference on Information Technology: Coding and Computing.

[2]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[3]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[4]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[5]  Ratko Orlandic,et al.  The design of a retrieval technique for high-dimensional data on tertiary storage , 2002, SGMD.

[6]  Chin-Wan Chung,et al.  The GC-tree: a high-dimensional index structure for similarity search in image databases , 2002, IEEE Trans. Multim..

[7]  Xiaoming Zhu,et al.  An efficient indexing method for nearest neighbor searches in high-dirnensional image databases , 2002, IEEE Trans. Multim..

[8]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[9]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[10]  Christos Faloutsos,et al.  Fast Nearest Neighbor Search in Medical Image Databases , 1996, VLDB.

[11]  Andreas Henrich A Distance Scan Algorithm for Spatial Access Structures , 1994, ACM-GIS.

[12]  Gunter Saake,et al.  The Active Vertice method: a performant filtering approach to high-dimensional indexing , 2004, Data Knowl. Eng..

[13]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[14]  John R. Smith,et al.  Supporting Incremental Join Queries on Ranked Inputs , 2001, VLDB.

[15]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[16]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[17]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[18]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[19]  John E. Freund,et al.  John E. Freund's Mathematical Statistics , 1998 .

[20]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[21]  Ronald Fagin,et al.  Combining fuzzy information from multiple systems (extended abstract) , 1996, PODS.

[22]  Klemens Böhm,et al.  Quality-aware and load sensitive planning of image similarity queries , 2001, Proceedings 17th International Conference on Data Engineering.

[23]  Yannis Manolopoulos,et al.  Image indexing and retrieval using signature trees , 2002, Data Knowl. Eng..

[24]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[25]  Jing Hu,et al.  Adaptive Quantization of the High-Dimensional Data for Efficient KNN Processing , 2004, DASFAA.

[26]  Divyakant Agrawal,et al.  Vector approximation based indexing for non-uniform high dimensional data sets , 2000, CIKM '00.

[27]  Djemel Ziou,et al.  Image Retrieval from the World Wide Web: Issues, Techniques, and Systems , 2004, CSUR.

[28]  Beng Chin Ooi,et al.  Indexing the Distance: An Efficient Method to KNN Processing , 2001, VLDB.

[29]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[30]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[31]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[32]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[33]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[34]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.