The application of space-filling curves to the storage and retrieval of multi-dimensional data

Indexing of multi-dimensional data has been the focus of a considerable amount of research e ort over many years but no generally agreed paradigm has emerged to compare with the impact of the B-Tree, for example, on the indexing of one-dimensional data. At the same time, the need for e cient methods is ever more important in an environment where databases become larger and more complex in their structures. Mapping multi-dimensional data to one dimension, thus enabling one-dimensional access methods to be exploited, has been suggested in the literature but for the most part interest has been con ned to the Z-order curve. The possibility of using other curves, such as the Hilbert and Gray-code curves, whose characteristics di er from those of the Z-order curve, has also been suggested. In this thesis we design and implement a working le store which is underpinned by the principle of mapping multi-dimensional data to one of a variety of spacelling curves and their variants. Data is then indexed using a B+ Tree which remains compact, regardless of the volume and number of dimensions. The implementation has entailed developing algorithms for mapping data to one dimension and, most importantly, developing algorithms to facilitate the querying of data in a exible way. We focus on the Hilbert curve but also consider other curves and propose new alternative algorithms for querying data mapped to the Z-order curve. The current implementation accommodates data in up to sixteen dimensions but the approach is generic and not limited to this number. We report on preliminary testing of the implemetation, which provides very encouraging results. We also undertake a brief exploration of the application of spacelling curves to the indexing of spatial data.

[1]  E. H. Moore On certain crinkly curves , 1900 .

[2]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[3]  Christos Faloutsos,et al.  Gray Codes for Partial Match and Range Queries , 1988, IEEE Trans. Software Eng..

[4]  Arthur R. Butz,et al.  Alternative Algorithm for Hilbert's Space-Filling Curve , 1971, IEEE Transactions on Computers.

[5]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[6]  A. J. Cole Direct transformations between sets of integers and hilbert polygons , 1986 .

[7]  Michael Freeston Begriffsverzeichnis: a Concept Index , 1993, BNCOD.

[8]  E. F. Codd,et al.  A Relational Model for Large Shared Data Banks , 1970 .

[9]  Akhil Kumar A Study of Spatial Clustering techniques , 1994, DEXA.

[10]  Alexandra Poulovassilis,et al.  The Implementation of FDL, a Functional Database Language , 1992, Comput. J..

[11]  Arthur R. Butz,et al.  Space Filling Curves and Mathematical Programming , 1968, Inf. Control..

[12]  Joël Quinqueton,et al.  A Locally Adaptive Peano Scanning Algorithm , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Leslie M. Goldschlager Short algorithms for space‐filling curves , 1981, Softw. Pract. Exp..

[14]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[15]  Michael Freeston The Application of Multi-Dimensional Indexing Methods to Constraints , 1995, CDB.

[16]  H. V. Jagadish,et al.  Linear clustering of objects with multiple attributes , 1990, SIGMOD '90.

[17]  H. V. Jagadish,et al.  Analysis of the Hilbert Curve for Representing Two-Dimensional Space , 1997, Inf. Process. Lett..

[18]  Aris M. Ouksel,et al.  The Nested Interpolation Based Grid File , 1991, MFDBS.

[19]  A. J. Fisher A new algorithm for generating hilbert curves , 1986, Softw. Pract. Exp..

[20]  Theodore Bially,et al.  Space-filling curves: Their generation and their application to bandwidth reduction , 1969, IEEE Trans. Inf. Theory.

[21]  Michael William Freeston Data structure for knowledge bases , 1997 .

[22]  Jack A. Orenstein A comparison of spatial query processing techniques for native and parameter spaces , 1990, SIGMOD '90.

[23]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[24]  Niklaus Wirth,et al.  Algorithms + Data Structures = Programs , 1976 .

[25]  Günther F. Schrack,et al.  Encoding and decoding the Hilbert order , 1996 .

[26]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[27]  Brian Wyvill,et al.  On the generation and use of space‐filling curves , 1983, Softw. Pract. Exp..

[28]  J. Griffiths,et al.  Table-driven algorithms for generating space-filling curves , 1985 .

[29]  E. Reingold,et al.  Combinatorial Algorithms: Theory and Practice , 1977 .

[30]  Jan Jannink,et al.  Implementing deletion in B+-trees , 1995, SGMD.

[31]  Arthur R. Butz,et al.  Convergence with Hilbert's Space Filling Curve , 1969, J. Comput. Syst. Sci..

[32]  Alexandra Poulovassilis,et al.  TriStap - An Investigation into the Implementation and Exploitation of Binary Relational Storage Structures , 1990, BNCOD.

[33]  Jack A. Orenstein Redundancy in spatial databases , 1989, SIGMOD '89.

[34]  J. G. Griffiths An algorithm for displaying a class of space‐filling curves , 1986, Softw. Pract. Exp..

[35]  Michael Freeston,et al.  The BANG file: A new kind of grid file , 1987, SIGMOD '87.

[36]  Robert Ayers Enhancing the semantic power of functional database languages , 1995 .

[37]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[38]  Christos Faloutsos,et al.  DOT: A Spatial Access Method Using Fractals , 1991, ICDE.

[39]  Rolf Niedermeier,et al.  On Multi-dimensional Hilbert Indexings , 1998, COCOON.

[40]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.

[41]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[42]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[43]  Michiharu Niimi,et al.  An interactive analysis method for multidimensional images using a Hilbert curve , 1995, Systems and Computers in Japan.

[44]  Hermann Tropf,et al.  Multimensional Range Search in Dynamically Balanced Trees , 1981, Angew. Inform..

[45]  Christos Faloutsos,et al.  Multiattribute hashing using Gray codes , 1986, SIGMOD '86.

[46]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[47]  Michael Freeston A general solution of the n-dimensional B-tree problem , 1995, SIGMOD '95.

[48]  Klaus H. Hinrichs,et al.  Implementation of the grid file: Design concepts and experience , 1985, BIT.

[49]  Paul Francis Meredith A functional programming language which integrates queries and updates for managing an entity-function database , 1999 .

[50]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[51]  Christos Faloutsos,et al.  Analysis of the Clustering Properties of the Hilbert Space-Filling Curve , 2001, IEEE Trans. Knowl. Data Eng..

[52]  Jack A. Orenstein Spatial query processing in an object-oriented database system , 1986, SIGMOD '86.

[53]  Michael Freeston,et al.  Advances in the Design of the BANG File , 1989, FODO.

[54]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[55]  A. J. Cole Compaction Techniques for Raster Scan Graphics Using Space-Filling Curves , 1987, Comput. J..

[56]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[57]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[58]  C. Sparrow The Fractal Geometry of Nature , 1984 .

[59]  Frank Manola,et al.  PROBE Spatial Data Modeling and Query Processing in an Image Database Application , 1988, IEEE Trans. Software Eng..

[60]  Mirfakhradin Derakhshan Rokhsari A development of the grid file for the storage of binary relations , 1989 .

[61]  Hans-Werner Six,et al.  The Twin Grid File: A Nearly Space Optimal Index Structure , 1988, EDBT.