Spatial indexing of high-dimensional data based on relative approximation

Abstract. We propose a novel index structure, the A-tree (approximation tree), for similarity searches in high-dimensional data. The basic idea of the A-tree is the introduction of virtual bounding rectangles (VBRs) which contain and approximate MBRs or data objects. VBRs can be represented quite compactly and thus affect the tree configuration both quantitatively and qualitatively. First, since tree nodes can contain a large number of VBR entries, fanout becomes large, which increases search speed. More importantly, we have a free hand in arranging MBRs and VBRs in the tree nodes. Each A-tree node contains an MBR and its children VBRs. Therefore, by fetching an A-tree node, we can obtain information on the exact position of a parent MBR and the approximate position of its children. We have performed experiments using both synthetic and real data sets. For the real data sets, the A-tree outperforms the SR-tree and the VA-file in all dimensionalities up to 64 dimensions, which is the highest dimension in our experiments. Additionally, we propose a cost model for the A-tree. We verify the validity of the cost model for synthetic and real data sets.

[1]  Christos Faloutsos,et al.  Estimating the Selectivity of Spatial Queries Using the 'Correlation' Fractal Dimension , 1995, VLDB.

[2]  Christos Faloutsos,et al.  On the 'Dimensionality Curse' and the 'Self-Similarity Blessing' , 2001, IEEE Trans. Knowl. Data Eng..

[3]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[4]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[5]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[6]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[7]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[8]  Andreas Henrich A Distance Scan Algorithm for Spatial Access Structures , 1994, ACM-GIS.

[9]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[10]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[11]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[12]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[14]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[15]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[16]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[17]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[18]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[19]  Hans-Peter Kriegel,et al.  The Buddy-Tree: An Efficient and Robust Access Method for Spatial Data Base Systems , 1990, VLDB.

[20]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[21]  Christos Faloutsos,et al.  Analysis of n-Dimensional Quadtrees using the Hausdorff Fractal Dimension , 1996, VLDB.

[22]  Christos Faloutsos,et al.  Multidimensional Access Methods: Trees Have Grown Everywhere , 1997, VLDB.

[23]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.