论文信息 - Indexing high-dimensional data for content-based retrieval in large databases

Indexing high-dimensional data for content-based retrieval in large databases

Many indexing approaches for high-dimensional data points have evolved into very complex and hard to code algorithms. Sometimes this complexity is not matched by increase in performance. Motivated by these ideas, we take a step back and look at simpler approaches to indexing multimedia data. In this paper we propose a simple, (not simplistic) yet efficient indexing structure for high-dimensional data Points of variable dimension, using dimension reduction. Our approach maps multidimensional points to a 1D line by computing their Euclidean Norm and use a B/sup +/-Tree to store data points. We exploit B/sup +/-Tree efficient sequential search to develop simple, yet performant methods to implement point, range and nearest-neighbor queries. To evaluate our technique we conducted a set of experiments, using both synthetic and real data. We analyze creation, insertion and query times as a function of data set size and dimension. Results so far show that our simple scheme outperforms current approaches, such as the Pyramid Technique, the A-Tree and the SR-Tree, for many data distributions. Moreover, our approach seems to scale better both with growing dimensionality and data set size, while exhibiting low insertion and search times.

Joaquim A. Jorge | Manuel J. Fonseca | J. Jorge

[1] Christos Faloutsos,et al. The TV-tree: An index structure for high-dimensional data , 1994, The VLDB Journal.

[2] Beng Chin Ooi,et al. Querying high-dimensional data in single-dimensional space , 2004, The VLDB Journal.

[3] Ramesh C. Jain,et al. Similarity indexing: algorithms and performance , 1996, Electronic Imaging.

[4] Hans-Werner Six,et al. The LSD tree: Spatial Access to Multidimensional Point and Nonpoint Objects , 1989, VLDB.

[5] Andreas Henrich,et al. The LSD/sup h/-tree: an access structure for feature vectors , 1998, Proceedings 14th International Conference on Data Engineering.

[6] Joaquim A. Jorge,et al. Experimental evaluation of an on-line scribble recognizer , 2001, Pattern Recognit. Lett..

[7] Hans-Peter Kriegel,et al. The pyramid-technique: towards breaking the curse of dimensionality , 1998, SIGMOD '98.

[8] Sharad Mehrotra,et al. The hybrid tree: an index structure for high dimensional feature spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9] Hans-Peter Kriegel,et al. Indexing the Solution Space: A New Technique for Nearest Neighbor Search in High-Dimensional Space , 2000, IEEE Trans. Knowl. Data Eng..

[10] David B. Lomet,et al. The hB-tree: a multiattribute indexing method with good guaranteed performance , 1990, TODS.

[11] Joaquim A. Jorge,et al. Towards content-based retrieval of technical drawings through high-dimensional indexing , 2003, Comput. Graph..

[12] Nimrod Megiddo,et al. Fast indexing method for multidimensional nearest-neighbor search , 1998, Electronic Imaging.

[13] Christian Böhm,et al. Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[14] Hans-Peter Kriegel,et al. The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[15] Peter Widmayer,et al. The LSD tree: spatial access to multidimensional and non-point objects , 1989, VLDB 1989.

[16] Hans-Jörg Schek,et al. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[17] Ramesh C. Jain,et al. Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[18] Masatoshi Yoshikawa,et al. The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[19] Shin'ichi Satoh,et al. The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.