Spatio-Temporal Hotspot Computation on Apache Spark (GIS Cup)

Large quantities of mobility data are produced by people and vehicles daily. Mining and analysis of patterns, such as hotspots, in this data can serve to improve location-based services. However, due to the massive amount of information, efficient techniques are needed for processing it in distributed environments using frameworks, such as Apache Spark. In this work, within the scope of the GIS Cup 2016, we focus on the detection of statistically significant hotspots in large-scale spatio-temporal data using the Getis-Ord Gi statistic on top of the Spark framework. Using a uniform spatio-temporal partitioning, we find the most significant drop-off locations in taxi trip data for New York City based on the passenger count. We present a baseline and two variants of an optimized solution for the problem. Finally, we compare and demonstrate the performance of the proposed algorithms through an experimental evaluation. CCS Concepts •Computing methodologies→MapReduce algorithms; •Information systems → Spatial-temporal systems;