Large-scale indexing of spatial data in distributed repositories: the SD-Rtree

We propose a scalable distributed data structure (SDDS) called SD-Rtree. We intend our structure for point, window and kNN queries over large spatial datasets distributed on clusters of interconnected servers. The structure balances the storage and processing load over the available resources, and aims at minimizing the size of the cluster. SD-Rtree generalizes the well-known Rtree structure. It uses a distributed balanced binary tree that scales with insertions to potentially any number of storage servers through splits of the overloaded ones. A user/application manipulates the structure from a client node. The client addresses the tree through its image that can be possibly outdated due to later split. This may generate addressing errors, solved by the forwarding among the servers. Specific messages towards the clients incrementally correct the outdated images. We present the building of an SD-Rtree through insertions, focusing on the split and rotation algorithms. We follow with the query algorithms. We describe then a flexible allocation protocol which allows to cope with a temporary shortage of storage resources through data storage balancing. Experiments show additional aspects of SD-Rtree and compare its behavior with a distributed quadtree. The results justify our various design choices and the overall utility of the structure.

[1]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[2]  Hanan Samet,et al.  Using a distributed quadtree index in peer-to-peer networks , 2007, The VLDB Journal.

[3]  Bin Liu,et al.  Supporting Complex Multi-Dimensional Queries in P2P Systems , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[4]  Susanne E. Hambrusch,et al.  Maintaining spatial data sets in distributed-memory machines , 1997, Proceedings 11th International Parallel Processing Symposium.

[5]  Beng Chin Ooi,et al.  BATON: A Balanced Tree Structure for Peer-to-Peer Networks , 2005, VLDB.

[6]  Johannes Gehrke,et al.  Querying peer-to-peer networks using P-trees , 2004, WebDB '04.

[7]  Jonas S. Karlsson hQT*: A Scalable Distributed Data Structure for High-Performance Spatial Accesses , 1998, FODO.

[8]  Yannis Theodoridis,et al.  On the Generation of Spatiotemporal Datasets , 1999 .

[9]  Cédric du Mouza,et al.  SD-Rtree: A Scalable Distributed Rtree , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Witold Litwin,et al.  RP*: A Family of Order Preserving Scalable Distributed Data Structures , 1994, VLDB.

[11]  Cédric du Mouza,et al.  Dynamic storage balancing in a distributed spatial index , 2007, GIS.

[12]  David Eppstein,et al.  Skip-webs: efficient distributed data structures for multi-dimensional data sets , 2005, PODC '05.

[13]  Robert Devine,et al.  Design and Implementation of DDH: A Distributed Dynamic Hashing Algorithm , 1993, FODO.

[14]  Anirban Mondal,et al.  P2PR-Tree: An R-Tree-Based Spatial Index for Peer-to-Peer Environments , 2004, EDBT Workshops.

[15]  Witold Litwin,et al.  LH*—a scalable, distributed data structure , 1996, TODS.

[16]  George Kollios,et al.  Management of Highly Dynamic Multidimensional Data in a Cluster of Workstations , 2004, EDBT.

[17]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[18]  Beng Chin Ooi,et al.  VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[20]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[21]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[22]  M. AdelsonVelskii,et al.  AN ALGORITHM FOR THE ORGANIZATION OF INFORMATION , 1963 .