A Two-level Spatial In-Memory Index

Very large volumes of spatial data increasingly become available and demand effective management. While there has been decades of research on spatial data management, few works consider the current state of commodity hardware, having relatively large memory and the ability of parallel multi-core processing. In this work, we re-consider the design of spatial indexing under this new reality. Specifically, we propose a main-memory indexing approach for objects with spatial extent, which is based on a classic regular space partitioning into disjoint tiles. The novelty of our index is that the contents of each tile are further partitioned into four classes. This second-level partitioning not only reduces the number of comparisons required to compute the results, but also avoids the generation and elimination of duplicate results, which is an inherent problem of spatial indexes based on disjoint space partitioning. The spatial partitions defined by our indexing scheme are totally independent, facilitating effortless parallel evaluation, as no synchronization or communication between the partitions is necessary. We show how our index can be used to efficiently process spatial range queries and drastically reduce the cost of the refinement step of the queries. In addition, we study the efficient processing of numerous range queries in batch and in parallel. Extensive experiments on real datasets confirm the efficiency of our approaches.

[1]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[2]  Andreas Kipf,et al.  How Good Are Modern Spatial Analytics Systems? , 2018, Proc. VLDB Endow..

[3]  David J. DeWitt,et al.  Clone join and shadow join: two parallel spatial join algorithms , 2000, GIS '00.

[4]  Simonas Saltenis,et al.  Trees or grids?: indexing moving objects in main memory , 2009, GIS.

[5]  Minyi Guo,et al.  Simba: Efficient In-Memory Spatial Analytics , 2016, SIGMOD Conference.

[6]  Suprio Ray,et al.  A Performance Study of Big Spatial Data Systems , 2018, BigSpatial@SIGSPATIAL.

[7]  Kyriakos Mouratidis,et al.  Conceptual partitioning: an efficient method for continuous nearest neighbor monitoring , 2005, SIGMOD '05.

[8]  Beng Chin Ooi,et al.  Query and Update Efficient B+-Tree Based Indexing of Moving Objects , 2004, VLDB.

[9]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[10]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[11]  Parth Nagarkar,et al.  Compressed Spatial Hierarchical Bitmap (cSHB) Indexes for Efficiently Processing Spatial Range Query Workloads , 2015, Proc. VLDB Endow..

[12]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[13]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[14]  Ryan Johnson,et al.  Skew-resistant parallel in-memory spatial join , 2014, SSDBM '14.

[15]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[16]  Hans-Peter Kriegel,et al.  Parallel processing of spatial joins using R-trees , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[17]  Le Gruenwald,et al.  Large-scale spatial join query processing in Cloud , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[18]  Jon Louis Bentley,et al.  Data Structures for Range Searching , 1979, CSUR.

[19]  Ahmed Eldawy,et al.  SpatialHadoop: A MapReduce framework for spatial data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[20]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[21]  Suprio Ray,et al.  Supporting Location-Based Services in a Main-Memory Database , 2014, 2014 IEEE 15th International Conference on Mobile Data Management.

[22]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[23]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[24]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[25]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[26]  Kihong Kim,et al.  Optimizing multidimensional index trees for main memory access , 2001, SIGMOD '01.

[27]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[28]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[29]  Jens Dittrich,et al.  Indexing Moving Objects Using Short-Lived Throwaway Indexes , 2009, SSTD.

[30]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[31]  Bernhard Seeger,et al.  Data redundancy and duplicate detection in spatial join processing , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[32]  Xiaofang Zhou,et al.  Data Partitioning for Parallel Spatial Join Processing , 1997, GeoInformatica.

[33]  Tran Vu Pham,et al.  SIDI: A Scalable in-Memory Density-based Index for Spatial Databases , 2016, DIDC@HPDC.

[34]  Thomas Heinis,et al.  BLOCK: Efficient Execution of Spatial Range Queries in Main-Memory , 2017, SSDBM.

[35]  Xiaohui Yu,et al.  Monitoring k-nearest neighbor queries over moving objects , 2005, 21st International Conference on Data Engineering (ICDE'05).

[36]  Walid G. Aref,et al.  SINA: scalable incremental processing of continuous queries in spatio-temporal databases , 2004, SIGMOD '04.

[37]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[38]  Jia Yu,et al.  Spatial data management in apache spark: the GeoSpark perspective and beyond , 2018, GeoInformatica.

[39]  Susanne E. Hambrusch,et al.  Main Memory Evaluation of Monitoring Queries Over Moving Objects , 2004, Distributed and Parallel Databases.

[40]  Nikos Mamoulis,et al.  Spatial Data Management , 2011, Synthesis Lectures on Data Management.