A User-centric View of Distributed Spatial Data Management Systems

Distributed spatial data management systems (DSDMSs) represent a new technology capable of managing huge volumes of spatial data using parallel and distributed frameworks. An increasing number of DSDMSs have been proposed in the literature, requiring a comparison among them. However, comparisons available in the literature only provide a system-centric view of DSDMSs, which is essentially based on performance evaluations. Thus, there is a lack of comparisons based on the user-centric view, which is aimed to help users to understand how the characteristics of DSDMSs are useful to meet the specific requirements of their spatial applications. In this paper, we fill this gap in the literature. We provide a user-centric comparison of Hadoop-GIS, SpatialHadoop, SpatialSpark, GeoSpark, SIMBA, LocationSpark, SparkGIS, and Elcano, using as a basis an extensive set of criteria related to the characteristics of spatial data handling and to the aspects inherent to distributed systems. Based on this comparison, we introduce a set of guidelines to help users to choose an appropriate DSDMS. We also describe a case study to illustrate the use of these guidelines.

[1]  Minyi Guo,et al.  Simba: Efficient In-Memory Spatial Analytics , 2016, SIGMOD Conference.

[2]  Thierry Badard,et al.  Elcano: A Geospatial Big Data Processing System based on SparkSQL , 2018, GISTAM.

[3]  Andreas Kipf,et al.  How Good Are Modern Spatial Analytics Systems? , 2018, Proc. VLDB Endow..

[4]  Le Gruenwald,et al.  Large-scale spatial join query processing in Cloud , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[5]  Clodoveu A. Davis,et al.  Could Data from Location-Based Social Networks Be Used to Support Urban Planning? , 2017, WWW.

[6]  Mohamed Sarwat,et al.  GeoSpark: a cluster computing framework for processing large-scale spatial data , 2015, SIGSPATIAL/GIS.

[7]  Pierre Karrasch,et al.  Ad-hoc combination and analysis of heterogeneous and distributed spatial data for environmental monitoring – design and prototype of a web-based solution , 2018, Int. J. Digit. Earth.

[8]  Kai-Uwe Sattler,et al.  Big Spatial Data Processing Frameworks: Feature and Performance Evaluation , 2017, EDBT.

[9]  Fei-Yue Wang,et al.  A Survey of Traffic Data Visualization , 2015, IEEE Transactions on Intelligent Transportation Systems.

[10]  K.B. Lee,et al.  Open standards for homeland security sensor networks , 2005, IEEE Instrumentation & Measurement Magazine.

[11]  Joel H. Saltz,et al.  SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing , 2017, SIGSPATIAL/GIS.

[12]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[13]  Ahmed Eldawy,et al.  SpatialHadoop: A MapReduce framework for spatial data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[14]  Michael Vassilakopoulos,et al.  A Comparison of Distributed Spatial Data Management Systems for Processing Distance Join Queries , 2017, ADBIS.

[15]  Ralf Hartmut Güting,et al.  An introduction to spatial database systems , 1994, VLDB J..

[16]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[17]  Walid G. Aref,et al.  LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data , 2016, Proc. VLDB Endow..