Distributed Density Peak Clustering of Trajectory Data on Spark

With the widespread use of mobile devices and GPS, trajectory data mining has become a very popular research field. However, for many applications, a huge amount of trajectory data is collected, which raises the problem of how to efficiently mine this data. To process large batches of trajectory data, this paper proposes a distributed trajectory clustering algorithm based on density peak clustering, named DTR-DPC. The proposed method partitions the trajectory data into dense and sparse areas during the trajectory partitioning and division stage, and then applies different trajectory division methods for different areas. Then, the algorithm replaces each dense area by a single abstract trajectory to fit the distribution of trajectory points in dense areas, which can reduce the amount of distance calculation. Finally, a novel density peak clustering-based method (E-DPC) for Spark is applied, which requires limited human intervention. Experimental results on several large trajectory datasets show that thanks to the proposed approach, runtime of trajectory clustering can be greatly decreased while obtaining a high accuracy.

[1]  Takaaki Matsumoto,et al.  Dynamic Distributed Genetic Algorithm Using Hierarchical Clustering for Flight Trajectory Optimization of Winged Rocket , 2013, 2013 12th International Conference on Machine Learning and Applications.

[2]  Ming-Syan Chen,et al.  Profiling Moving Objects by Dividing and Clustering Trajectories Spatiotemporally , 2013, IEEE Transactions on Knowledge and Data Engineering.

[3]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[4]  Abderrahmane Boubezoul,et al.  Vehicle trajectories classification using Support Vectors Machines for failure trajectory prediction , 2009, 2009 International Conference on Advances in Computational Tools for Engineering Applications.

[5]  Christian S. Jensen,et al.  Effective Online Group Discovery in Trajectory Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  Nianxue Luo,et al.  Parallel clustering of big data of spatio-temporal trajectory , 2015, 2015 11th International Conference on Natural Computation (ICNC).

[7]  Liu Zhong,et al.  Movement Pattern Extraction Based on a Non-parameter Sub-trajectory Clustering Algorithm , 2019, 2019 IEEE 4th International Conference on Big Data Analytics (ICBDA).

[8]  Hanhai Zhou,et al.  Using DTW to measure trajectory distance in grid space , 2014, 2014 4th IEEE International Conference on Information Science and Technology.

[9]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Shu Gao,et al.  Research on Fast and Parallel Clustering Method for Trajectory Data , 2018, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[11]  Nicholas Jing Yuan,et al.  Online Discovery of Gathering Patterns over Trajectories , 2014, IEEE Transactions on Knowledge and Data Engineering.

[12]  Jae-Gil Lee,et al.  Trajectory clustering: a partition-and-group framework , 2007, SIGMOD '07.

[13]  Te-Hsun Lin,et al.  New data structure and algorithm for Mining Dynamic Periodic Patterns , 2010 .

[14]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Jiaxuan Yang,et al.  An adaptive hierarchical clustering method for ship trajectory data based on DBSCAN algorithm , 2017, 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)(.

[16]  Zhihua Chen,et al.  DBSCAN Algorithm Clustering for Massive AIS Data Based on the Hadoop Platform , 2017, 2017 International Conference on Industrial Informatics - Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII).

[17]  Berk Hess,et al.  GROMACS 3.0: a package for molecular simulation and trajectory analysis , 2001 .

[18]  Kenneth Tze Kin Teo,et al.  Modeling of Vehicle Trajectory using K-Means and Fuzzy C-Means Clustering , 2018, 2018 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET).

[19]  Nicholas Jing Yuan,et al.  On discovery of gathering patterns from trajectories , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).