Parallel grid-based density peak clustering of big trajectory data

With the widespread adoption of data intensive applications such as navigation systems for mobile devices and unmanned vehicles, analyzing trajectory data has become a key research area. One of the main tasks is trajectory clustering, which consists of automatically grouping similar trajectories into clusters. To perform this task, Density Peak Clustering (DPC) is widely used due to its speed and small number of artificial parameters. However, a major problem is that its performance does not scale well for large datasets. To address this issue, this paper proposes an efficient parallel trajectory clustering algorithm, named Tra-PDPC (Trajectory-Parallel DPC). It is applied in three steps, namely trajectory division and partition, trajectory similarity calculation, and clustering. Those steps are all designed to run in a distributed fashion using the Spark programming model. For the first step, a scheme is proposed to divide sub-trajectories based on local grid area density. Then, a combined similarity measurement method based on Euclidean space and grid space is defined for sub-trajectories similarity calculation. Finally, a version of DPC is applied, which dramatically improves clustering speed. Experiments on multiple large realistic trajectory datasets have demonstrated that the proposed Tra-PDPC algorithm can considerably decrease runtime while providing a high accuracy.

[1]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[2]  Makoto Imamura,et al.  Discovery of evolving companion from trajectory data streams , 2020, Knowledge and Information Systems.

[3]  Ickjai Lee,et al.  Hierarchical trajectory clustering for spatio-temporal periodic pattern mining , 2018, Expert Syst. Appl..

[4]  Guodong Yang,et al.  Incremental Frequent Sub-trajectory Mining Based on Dual Division , 2018, 2018 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC).

[5]  Le Gruenwald,et al.  DynMDL: A Parallel Trajectory Segmentation Algorithm , 2018, 2018 IEEE International Congress on Big Data (BigData Congress).

[6]  Youmin Zhang,et al.  Flatness-Based Trajectory Planning/Replanning for a Quadrotor Unmanned Aerial Vehicle , 2012, IEEE Transactions on Aerospace and Electronic Systems.

[7]  Zili Zhang,et al.  Dividing Traffic Sub-areas Based on a Parallel K-Means Algorithm , 2014, KSEM.

[8]  Liu Zhong,et al.  Movement Pattern Extraction Based on a Non-parameter Sub-trajectory Clustering Algorithm , 2019, 2019 IEEE 4th International Conference on Big Data Analytics (ICBDA).

[9]  Zhongfei Zhang,et al.  An Incremental DPMM-Based Method for Trajectory Clustering, Modeling, and Retrieval , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Yonggang Zhang,et al.  Cludoop: An Efficient Distributed Density-Based Clustering for Big Data Using Hadoop , 2015, Int. J. Distributed Sens. Networks.

[11]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[12]  Jae-Gil Lee,et al.  Trajectory clustering: a partition-and-group framework , 2007, SIGMOD '07.

[13]  Min Xie,et al.  EDS: a segment-based distance measure for sub-trajectory similarity search , 2014, SIGMOD Conference.

[14]  Pierpaolo D'Urso,et al.  Robust fuzzy clustering of multivariate time trajectories , 2018, Int. J. Approx. Reason..

[15]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Nianxue Luo,et al.  Parallel clustering of big data of spatio-temporal trajectory , 2015, 2015 11th International Conference on Natural Computation (ICNC).

[17]  Chia-Ho Ou,et al.  Path Planning Algorithm for Mobile Anchor-Based Localization in Wireless Sensor Networks , 2013, IEEE Sensors Journal.

[18]  Nikos Pelekis,et al.  Segmentation and Sampling of Moving Object Trajectories Based on Representativeness , 2012, IEEE Transactions on Knowledge and Data Engineering.

[19]  Hanhai Zhou,et al.  Using DTW to measure trajectory distance in grid space , 2014, 2014 4th IEEE International Conference on Information Science and Technology.

[20]  Christos Faloutsos,et al.  FTW: fast similarity search under the time warping distance , 2005, PODS.

[21]  Xiaoming Liu,et al.  An Improved High-Density Sub Trajectory Clustering Algorithm , 2020, IEEE Access.

[22]  Lin Gao,et al.  Distributed Density Peak Clustering of Trajectory Data on Spark , 2020, IEA/AIE.

[23]  Nikos Pelekis,et al.  Scalable Distributed Subtrajectory Clustering , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[24]  Yan Liu,et al.  Parallel gathering discovery over big trajectory data , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[25]  Min Wang,et al.  A Parallel Clustering Algorithm Based on Grid Index for Spatio-temporal Trajectories , 2015, 2015 Third International Conference on Advanced Cloud and Big Data.

[26]  Yantao Li,et al.  An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division , 2015 .

[27]  Shu Gao,et al.  Research on Fast and Parallel Clustering Method for Trajectory Data , 2018, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[28]  Kourosh Kiani,et al.  A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark , 2018, Symmetry.

[29]  Hua Yuan,et al.  From Trajectories to Path Network: An Endpoints-Based GPS Trajectory Partition and Clustering Framework , 2014, WAIM.

[30]  Panos Kalnis,et al.  Parallel trajectory similarity joins in spatial networks , 2018, The VLDB Journal.

[31]  Hao Tang,et al.  A smart low-consumption IoT framework for location tracking and its real application , 2016, 2016 6th International Conference on Electronics Information and Emergency Communication (ICEIEC).

[32]  Karine Zeitouni,et al.  Online Clustering of Trajectory Data Stream , 2016, 2016 17th IEEE International Conference on Mobile Data Management (MDM).

[33]  Qian He,et al.  A parallel clustering and test partitioning techniques based mining trajectory algorithm for moving objects , 2017, 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD).

[34]  Yang Gao,et al.  TRASMIL: A local anomaly detection framework based on trajectory segmentation and multi-instance learning , 2013, Comput. Vis. Image Underst..

[35]  Richard E. Tremblay,et al.  KmL3D: A non-parametric algorithm for clustering joint trajectories , 2013, Comput. Methods Programs Biomed..

[36]  Rui Liu,et al.  Parallel Implementation of Density Peaks Clustering Algorithm Based on Spark , 2017 .

[37]  Zhiming Gui,et al.  Locating Traffic Hot Routes from Massive Taxi Tracks in Clusters , 2016, J. Inf. Sci. Eng..

[38]  Bo Guan,et al.  Tra-DBScan: A Algorithm of Clustering Trajectories , 2011 .

[39]  Zhihua Chen,et al.  DBSCAN Algorithm Clustering for Massive AIS Data Based on the Hadoop Platform , 2017, 2017 International Conference on Industrial Informatics - Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII).

[40]  Kenneth Tze Kin Teo,et al.  Modeling of Vehicle Trajectory using K-Means and Fuzzy C-Means Clustering , 2018, 2018 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET).

[41]  Xiaohui Huang,et al.  A scalable and fast OPTICS for clustering trajectory big data , 2015, Cluster Computing.

[42]  L. Bergroth,et al.  A survey of longest common subsequence algorithms , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.