Spatial indexing of distributed multidimensional datasets

While declustering methods for distributed multidimensional indexing of large datasets have been researched widely in the past, replication techniques for multidimensional indexes have not been investigated deeply. In general, a centralized index server may become the performance bottleneck in a wide area network rather than the data servers, since the index is likely to be accessed more often than any of the datasets in the servers. In this paper, we present two different multidimensional indexing algorithms for a distributed environment - a centralized global index and a two-level hierarchical index. Our experimental results show that the centralized scheme does not scale well for either insertion or searching the index. In order to improve the scalability of the index server, we have employed a replication protocol for both the centralized and two-level index schemes that allows some inconsistency between replicas without affecting correctness. Our experiments show that the two-level hierarchical index scheme shows better scalability for both building and searching the index than the non-replicated centralized index, but replication can make the centralized index faster than the two-level hierarchical index for searching in some cases.

[1]  Seok Il Song,et al.  An Enhanced Concurrency Control Scheme for Multidimensional Index Structures , 2004, IEEE Trans. Knowl. Data Eng..

[2]  C. Mohan,et al.  Concurrency and recovery in generalized search trees , 1997, SIGMOD '97.

[3]  Christos Faloutsos,et al.  Declustering Spatial Databases on a Multi-Computer Architecture , 1996, EDBT.

[4]  Man Lung Yiu,et al.  Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSDBM) , 2007 .

[5]  Scott T. Leutenegger,et al.  Master-client R-trees: a new parallel R-tree architecture , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[6]  Gustavo Alonso,et al.  Understanding replication in databases and distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[7]  Alan Sussman,et al.  Improving access to multi-dimensional self-describing scientific datasets , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[8]  Reagan Moore,et al.  MySRB and SRB - components of a Data Grid , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[9]  Anirban Mondal,et al.  P2PR-Tree: An R-Tree-Based Spatial Index for Peer-to-Peer Environments , 2004, EDBT Workshops.

[10]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[11]  Alan Sussman,et al.  A comparative study of spatial indexing techniques for multidimensional scientific datasets , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[12]  Reagan Moore,et al.  MySRB & SRB: Components of a Data Grid , 2002 .

[13]  Beng Chin Ooi,et al.  R-tree-based data migration and self-tuning strategies in shared-nothing spatial databases , 2001, GIS '01.

[14]  Christos Faloutsos,et al.  Parallel R-trees , 1992, SIGMOD '92.

[15]  Ian F. Akyildiz,et al.  The Effect of Index Partitioning Schemes on the Performance of Distributed Query Processing , 1993, IEEE Trans. Knowl. Data Eng..

[16]  Witold Litwin,et al.  LH* - Linear Hashing for Distributed Files , 1993, SIGMOD Conference.