Hot Spot Analysis over Big Trajectory Data

Hot spot analysis is the problem of identifying statistically significant spatial clusters from an underlying data set. In this paper, we study the problem of hot spot analysis for massive trajectory data of moving objects, which has many real-life applications in different domains, especially in the analysis of vast repositories of historical traces of spatio-temporal data (cars, vessels, aircrafts). In order to identify hot spots, we propose an approach that relies on the Getis-Ord statistic, which has been used successfully in the past for point data. Since trajectory data is more than just a collection of individual points, we formulate the problem of trajectory hot spot analysis, using the Getis-Ord statistic. We propose a parallel and scalable algorithm for this problem, called THS, which provides an exact solution and can operate on vast-sized data sets. Moreover, we introduce an approximate algorithm (aTHS) that avoids exhaustive computation and trades-off accuracy for efficiency in a controlled manner. In essence, we provide a method that quantifies the maximum induced error in the approximation, in relation with the achieved computational savings. We develop our algorithms in Apache Spark and demonstrate the scalability and efficiency of our approach using a large, historical, real-life trajectory data set of vessels sailing in the Eastern Mediterranean for a period of three years.

[1]  Lei Zou,et al.  Detecting urban black holes based on human mobility data , 2015, SIGSPATIAL/GIS.

[2]  Nikos Pelekis,et al.  Clustering uncertain trajectories , 2011, Knowledge and Information Systems.

[3]  Stefan Hagedorn,et al.  Efficient spatio-temporal event processing with STARK , 2017, EDBT.

[4]  Gerhard Fettweis,et al.  A Framework Enabling Spatial Analysis of Mobile Traffic Hot Spots , 2014, IEEE Wireless Communications Letters.

[5]  Hans Hagen,et al.  Understanding hotspots: a topological visual analytics approach , 2015, SIGSPATIAL/GIS.

[6]  Peiquan Jin,et al.  Detecting Hotspots from Trajectory Data in Indoor Spaces , 2015, DASFAA.

[7]  Nikos Pelekis,et al.  On temporal-constrained sub-trajectory cluster analysis , 2017, Data Mining and Knowledge Discovery.

[8]  Ahmed Eldawy,et al.  The Era of Big Spatial Data: A Survey , 2015 .

[9]  Dino Pedreschi,et al.  Time-focused clustering of trajectories of moving objects , 2006, Journal of Intelligent Information Systems.

[10]  Dino Pedreschi,et al.  Trajectory pattern mining , 2007, KDD '07.

[11]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[12]  Mohamed F. Mokbel,et al.  A Demonstration of ST-Hadoop: A MapReduce Framework for Big Spatio-temporal Data , 2017, Proc. VLDB Endow..

[13]  Nikos Mamoulis,et al.  Parallel and Distributed Processing of Spatial Preference Queries using Keywords , 2017, EDBT.

[14]  Yu Zheng,et al.  Trajectory Data Mining , 2015, ACM Trans. Intell. Syst. Technol..

[15]  Erik G. Hoel,et al.  Spatio-Temporal Join on Apache Spark , 2017, SIGSPATIAL/GIS.

[16]  Joachim Gudmundsson,et al.  Algorithms for hotspot computation on trajectory data , 2013, SIGSPATIAL/GIS.

[17]  Nikos Pelekis,et al.  Segmentation and Sampling of Moving Object Trajectories Based on Representativeness , 2012, IEEE Transactions on Knowledge and Data Engineering.

[18]  Heng Tao Shen,et al.  Discovering popular routes from trajectories , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[19]  Chinya V. Ravishankar,et al.  Finding Regions of Interest from Trajectory Data , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[20]  Yannis Theodoridis,et al.  Maritime data integration and analysis: recent progress and research challenges , 2017, EDBT.

[21]  Yan Liu,et al.  Parallel gathering discovery over big trajectory data , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[22]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[23]  J. Ord,et al.  Local Spatial Autocorrelation Statistics: Distributional Issues and an Application , 2010 .

[24]  P. Moran Notes on continuous stochastic phenomena. , 1950, Biometrika.

[25]  Kun Qin,et al.  DETECTING HOTSPOTS FROM TAXI TRAJECTORY DATA USING SPATIAL CLUSTER ANALYSIS , 2015 .

[26]  Jae-Gil Lee,et al.  Trajectory clustering: a partition-and-group framework , 2007, SIGMOD '07.

[27]  Nikos Pelekis,et al.  Unsupervised Trajectory Sampling , 2010, ECML/PKDD.

[28]  Reynold Cheng,et al.  Scalable Algorithms for Nearest-Neighbor Joins on Big Trajectory Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[29]  Verena Kantere,et al.  On-line discovery of hot motion paths , 2008, EDBT '08.

[30]  Nikos Mamoulis,et al.  Discovery of Collocation Episodes in Spatiotemporal Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[31]  Mohamed F. Mokbel,et al.  ST-Hadoop: a MapReduce framework for spatio-temporal data , 2017, GeoInformatica.

[32]  Walid G. Aref,et al.  LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data , 2016, Proc. VLDB Endow..