论文信息 - The IGrid index: reversing the dimensionality curse for similarity indexing in high dimensional space

The IGrid index: reversing the dimensionality curse for similarity indexing in high dimensional space

The similarity search and indexing problem is well known to be a di cult one for high dimensional applications. Most indexing structures show a rapid degradation with increasing dimensionality which leads to an access of the entire database for each query. Furthermore, recent research results show that in high dimensional space, even the concept of similarity may not be very meaningful. In this paper, we propose the IGrid-index; a method for similarity indexing which uses a distance function whose meaningfulness is retained with increasing dimensionality. In addition, this technique shows performance which is unique to all known index structures; the percentage of data accessed is inversely proportional to the overall data dimensionality. Thus, this technique relies on the dimensionality to be high in order to provide performance e cient similarity results. The IGridindex can also support a special kind of query which we refer to as projected range queries; a query which is increasingly relevant for very high dimensional data mining applications.

Philip S. Yu | Charu C. Aggarwal | C. Aggarwal

[1] Nick Roussopoulos,et al. Nearest neighbor queries , 1995, SIGMOD '95.

[2] Philip S. Yu,et al. Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[3] Hans-Peter Kriegel,et al. The pyramid-technique: towards breaking the curse of dimensionality , 1998, SIGMOD '98.

[4] Hanan Samet,et al. The Design and Analysis of Spatial Data Structures , 1989 .

[5] Kristin P. Bennett,et al. Density-based indexing for approximate nearest-neighbor queries , 1999, KDD '99.

[6] Christos Faloutsos,et al. Deflating the dimensionality curse using multiple fractal dimensions , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[7] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[8] Ramesh C. Jain,et al. Similarity indexing: algorithms and performance , 1996, Electronic Imaging.

[9] Ramesh C. Jain,et al. Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[10] S. Arya. Nearest neighbor searching and applications , 1996 .

[11] Hans-Peter Kriegel,et al. The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.