In-memory Spatial-Aware Framework for Processing Proximity-Alike Queries in Big Spatial Data

The widespread adoption of sensor-enabled and mobile ubiquitous devices has caused an avalanche of big data that is mostly geospatially tagged. Most cloud-based big data processing systems are designed for general-purpose workloads, neglecting spatial-characteristics. However, interesting analytics often seek answers for proximity-alike queries. We fill this gap by providing custom geospatial service layer atop of Apache Spark. To be more specific, we leverage Spark to design a custom spatial-aware partitioning method to boost geospatial query performances. Our results show that our patches outperform state-of-the-art implementations by significant fractions.

[1]  Hanan Samet,et al.  A consistent hierarchical representation for vector data , 1986, SIGGRAPH.

[2]  Erik G. Hoel,et al.  Spatial indexing and analytics on Hadoop , 2014, SIGSPATIAL/GIS.

[3]  Fusheng Wang,et al.  High performance spatial queries for spatial big data: from medical imaging to GIS , 2015, SIGSPACIAL.

[4]  Luca Foschini,et al.  COLLEGA middleware for the management of participatory Mobile Health Communities , 2016, 2016 IEEE Symposium on Computers and Communication (ISCC).

[5]  Desh Ranjan,et al.  Space-Filling Curves and Their Use in the Design of Geometric Data Structures , 1997, Theor. Comput. Sci..

[6]  Luca Foschini,et al.  Towards an Infrastructure to Support Big Data for a Smart City Project , 2016, 2016 IEEE 25th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE).

[7]  Mohamed Sarwat,et al.  GeoSpark: a cluster computing framework for processing large-scale spatial data , 2015, SIGSPATIAL/GIS.

[8]  Joel H. Saltz,et al.  Towards building a high performance spatial query system for large scale medical imaging data , 2012, SIGSPATIAL/GIS.

[9]  Antonio Corradi,et al.  ParticipAct: A Large-Scale Crowdsensing Platform , 2016, IEEE Transactions on Emerging Topics in Computing.

[10]  Bi-Ru Dai,et al.  Efficient Map/Reduce-Based DBSCAN Algorithm with Optimized Data Partition , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[11]  Patrick Valduriez,et al.  Distributed Database Design , 2011 .

[12]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[13]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[14]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[15]  Haoyu Tan,et al.  MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data , 2013, Frontiers of Computer Science.

[16]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[17]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[18]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[19]  Walid G. Aref,et al.  LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data , 2016, Proc. VLDB Endow..

[20]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[21]  Hee Yong Youn,et al.  Efficient batch processing of proximity queries with MapReduce , 2015, IMCOM.

[22]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[23]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[24]  Kai-Uwe Sattler,et al.  A framework for co-location patterns mining in big spatial data , 2017, 2017 XX IEEE International Conference on Soft Computing and Measurements (SCM).