An assessment on the performance of the shape functions in clustering based on representative trajectories of dense areas

ABSTRACT The study of trajectories of people, vehicles, and animals has many applications. Clustering is one of the ways to extract movement behaviors in trajectories. Due to the complex behavioral nature of trajectories, different geometric criteria, such as distance, shape, sinuosity, complexity, and orientation, are used for trajectory (dis)similarity calculation depending on the types of data. Comparison of trajectories based on shape is one of the approaches used to measure the similarity of trajectories for clustering. Up to now, different functions and descriptors have been proposed to compare two trajectories based on shape. However, the efficiency of these functions and descriptors in trajectory clustering has not been evaluated. In this paper, the similarity of trajectories based on shape was evaluated for trajectories clustering. Turning, signature, tangent, radius vector functions, and shape context descriptors are used for this evaluation. In addition, shape similarity of trajectories from the perspective of curvature, sinuosity, and complexity is also assessed. Since the criteria utilized are not the distance, clustering of trajectories has some limitations compared to previous clustering algorithms. To overcome these limitations, a framework called Clustering based on Representative Trajectories of Dense Areas (CRTDA) has been proposed for automatic clustering of trajectories. This framework can also be used for points and trajectories clustering based on other geometric criteria such as distance. Evaluations were performed on one simulated dataset and three real datasets, including pedestrian, car, and aircraft tracks. Finally, a practical comparison was made in terms of the diversity and quality of valid clusters and execution time between different criteria. According to the results, in terms of the variety and the number of clusters, the turning function and shape context with the mean of seven and four clusters provide a better performance, respectively. In terms of the clustering quality, the shape context and radius vector functions with mean silhouette index of 0.626 and 0.477, respectively, provide better performance. Moreover, signature and MA-sinuosity functions, respectively, with 43.2 and 49.5 seconds on average, provide better performance in terms of the execution time.

[1]  Claus Rick,et al.  Efficient Computation of All Longest Common Subsequences , 2000, SWAT.

[2]  D. Manager,et al.  Electric Vehicles , 1926, Nature.

[3]  Simon Benhamou,et al.  Optimal sinuosity in central place foraging movements , 1991, Animal Behaviour.

[4]  Yu Zheng,et al.  U-Air: when urban air quality inference meets big data , 2013, KDD.

[5]  Mohammad Reza Malek,et al.  VGI and Reference Data Correspondence Based on Location‐Orientation Rotary Descriptor and Segment Matching , 2015, Trans. GIS.

[6]  Karine Zeitouni,et al.  Online Clustering of Trajectory Data Stream , 2016, 2016 17th IEEE International Conference on Mobile Data Management (MDM).

[7]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[8]  Padhraic Smyth,et al.  Trajectory clustering with mixtures of regression models , 1999, KDD '99.

[9]  Maike Buchin,et al.  Approximating (k, ℓ)-Median Clustering for Polygonal Curves , 2020, SODA.

[10]  A. Alesheikh,et al.  Context-awareness in similarity measures and pattern discoveries of trajectories: a context-based dynamic time warping method , 2017 .

[11]  Diedrich Wolter,et al.  Spatial Representation and Reasoning for Robot Mapping - A Shape-Based Approach , 2008, Springer Tracts in Advanced Robotics.

[12]  Stefano Spaccapietra,et al.  Semantic trajectories: Mobility data computation and annotation , 2013, TIST.

[13]  S. Heppell,et al.  Identification of likely foraging habitat of pelagic loggerhead sea turtles (Caretta caretta) in the North Atlantic through analysis of telemetry track sinuosity , 2010 .

[14]  Sabine Timpf,et al.  Trajectory data mining: A review of methods and applications , 2016, J. Spatial Inf. Sci..

[15]  Dimitrios Gunopulos,et al.  Automatic Subspace Clustering of High Dimensional Data , 2005, Data Mining and Knowledge Discovery.

[16]  Sushmita Mitra,et al.  KDDClus : A Simple Method for Multi-Density Clustering , 2011 .

[17]  Xing Xie,et al.  Mining interesting locations and travel sequences from GPS trajectories , 2009, WWW '09.

[18]  Sanjay Garg,et al.  Development and validation of OPTICS based spatio-temporal clustering technique , 2016, Inf. Sci..

[19]  Yanmin Zhu,et al.  A Survey on Trajectory Data Mining: Techniques and Applications , 2016, IEEE Access.

[20]  Marcos M. Campos,et al.  O-Cluster: scalable clustering of large high dimensional data sets , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[21]  Fahui Wang,et al.  Urban land uses and traffic 'source-sink areas': Evidence from GPS-enabled taxi data in Shanghai , 2012 .

[22]  Jiuyong Li,et al.  STMM: Semantic and Temporal-Aware Markov Chain Model for Mobility Prediction , 2015, ICDS.

[23]  Zhaohui Wu,et al.  Prediction of urban human mobility using large-scale taxi traces and its applications , 2012, Frontiers of Computer Science.

[24]  João Gama,et al.  Discovering locations and habits from human mobility data , 2020, Annals of Telecommunications.

[25]  Aoying Zhou,et al.  Distributed top-k similarity query on big trajectory streams , 2017, Frontiers of Computer Science.

[26]  Klaus C. J. Dietmayer,et al.  Cooperative multi sensor network for traffic safety applications at intersections , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[27]  A. Volgenant,et al.  Technical Note - An Improved Transformation of the Symmetric Multiple Traveling Salesman Problem , 1988, Oper. Res..

[28]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[29]  Michalis Vazirgiannis,et al.  Clustering validity assessment: finding the optimal partitioning of a data set , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[30]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[31]  Vit Niennattrakul,et al.  Shape-Based Clustering for Time Series Data , 2012, PAKDD.

[32]  Jinyang Chen,et al.  Clustering of trajectories based on Hausdorff distance , 2011, 2011 International Conference on Electronics, Communications and Control (ICECC).

[33]  Tijs Neutens,et al.  Extracting spatio‐temporal patterns in animal trajectories: an ecological application of sequence analysis methods , 2016 .

[34]  Jinde Cao,et al.  Exploring a large-scale multi-modal transportation recommendation system , 2021 .

[35]  Yongyang Xu,et al.  Urban function classification at road segment level using taxi trajectory data: A graph convolutional neural network approach , 2021, Comput. Environ. Urban Syst..

[36]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[37]  Xiaoyu Wang,et al.  Towards Efficient Spectrum Sensing for Cognitive Radio through Knowledge-Based Reasoning , 2008, 2008 3rd IEEE Symposium on New Frontiers in Dynamic Spectrum Access Networks.

[38]  Shehroz S. Khan,et al.  Spatiotemporal clustering: a review , 2019, Artificial Intelligence Review.

[39]  Guojun Lu,et al.  Review of shape representation and description techniques , 2004, Pattern Recognit..

[40]  Longin Jan Latecki,et al.  Shape Similarity Measure Based on Correspondence of Visual Parts , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Padhraic Smyth,et al.  Probabilistic clustering of extratropical cyclones using regression mixture models , 2007 .

[42]  Jitendra Malik,et al.  Efficient shape matching using shape contexts , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Maike Buchin,et al.  Approximating $(k,\ell)$-Median Clustering for Polygonal Curves , 2020 .

[44]  Maguelonne Teisseire,et al.  A knowledge discovery process for spatiotemporal data: Application to river water quality monitoring , 2015, Ecol. Informatics.

[45]  Nikos Pelekis,et al.  Hot Spot Analysis over Big Trajectory Data , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[46]  Siyuan Liu,et al.  Towards mobility-based clustering , 2010, KDD.

[47]  Joseph B. Kruskal,et al.  Time Warps, String Edits, and Macromolecules , 1999 .

[48]  Rahim Ali Abbaspour,et al.  Assessing the efficiency of shape-based functions and descriptors in multi-scale matching of linear objects , 2018 .

[49]  João Gama,et al.  From mobility data to habits and common pathways , 2020, Expert Syst. J. Knowl. Eng..

[50]  João Gama,et al.  ODAC: Hierarchical Clustering of Time Series Data Streams , 2006, SDM.

[51]  Remco C. Veltkamp,et al.  State of the Art in Shape Matching , 2001, Principles of Visual Information Retrieval.

[52]  H. Mannila,et al.  Computing Discrete Fréchet Distance ∗ , 1994 .

[53]  Dimitrios Gunopulos,et al.  Indexing Multidimensional Time-Series , 2004, The VLDB Journal.

[54]  Lei Gao,et al.  A Probabilistic Embedding Clustering Method for Urban Structure Detection , 2017, ArXiv.

[55]  Konrad Doll,et al.  Detecting Intentions of Vulnerable Road Users Based on Collective Intelligence , 2018, ArXiv.

[56]  Gang Chen,et al.  KSQ: Top-k Similarity Query on Uncertain Trajectories , 2013, IEEE Transactions on Knowledge and Data Engineering.

[57]  Alireza Chehreghan,et al.  An assessment of spatial similarity degree between polylines on multi-scale, multi-source maps , 2017 .

[58]  W. Eric L. Grimson,et al.  Learning Semantic Scene Models by Trajectory Analysis , 2006, ECCV.

[59]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[60]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[61]  Jitendra Malik,et al.  Matching Shapes , 2001, ICCV.

[62]  Bo Guan,et al.  Tra-DBScan: A Algorithm of Clustering Trajectories , 2011 .

[63]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[64]  Takayuki Morikawa,et al.  Big Trajectory Data Mining: A Survey of Methods, Applications, and Services , 2020, Sensors.

[65]  Kevin Buchin,et al.  (k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time Warping , 2020, SIGSPATIAL/GIS.

[66]  Vedran Podobnik,et al.  Electric Vehicles: A Data Science Perspective Review , 2019, Electronics.

[67]  Jieping Ye,et al.  The Simpler The Better: A Unified Approach to Predicting Original Taxi Demands based on Large-Scale Online Platforms , 2017, KDD.

[68]  Kevin Buchin,et al.  klcluster: Center-based Clustering of Trajectories , 2019, SIGSPATIAL/GIS.

[69]  Maike Buchin,et al.  Segmenting trajectories: A framework and algorithms using spatiotemporal criteria , 2011, J. Spatial Inf. Sci..

[70]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[71]  Aoying Zhou,et al.  Online clustering of streaming trajectories , 2017, Frontiers of Computer Science.

[72]  Chinya V. Ravishankar,et al.  Indexing Spatio-Temporal Trajectories with Efficient Polynomial Approximations , 2007, IEEE Transactions on Knowledge and Data Engineering.

[73]  Dino Pedreschi,et al.  Time-focused clustering of trajectories of moving objects , 2006, Journal of Intelligent Information Systems.

[74]  Thomas Seidl,et al.  Using internal evaluation measures to validate the quality of diverse stream clustering algorithms , 2017, Vietnam Journal of Computer Science.

[75]  Simon Jirka,et al.  enviroCar - Crowd Sourced Traffic and Environment Data for Sustainable Mobility , 2013 .

[76]  R. Ali Abbaspour,et al.  An evaluation of the efficiency of similarity functions in density-based clustering of spatial trajectories , 2019, Ann. GIS.

[77]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[78]  Yong Yu,et al.  Inferring gas consumption and pollution emission of vehicles throughout a city , 2014, KDD.

[79]  Xing Xie,et al.  Discovering regions of different functions in a city using human mobility and POIs , 2012, KDD.

[80]  B Anbaroglu,et al.  Non-recurrent traffic congestion detection on heterogeneous urban road networks , 2015 .

[81]  Meng Joo Er,et al.  A New Method for Automatic Determining of the DBSCAN Parameters , 2020, J. Artif. Intell. Soft Comput. Res..

[82]  Dominique Barth,et al.  Indexing in-network trajectory flows , 2011, The VLDB Journal.

[83]  Jae-Gil Lee,et al.  Trajectory clustering: a partition-and-group framework , 2007, SIGMOD '07.

[84]  Remco C. Veltkamp,et al.  Shape matching: similarity measures and algorithms , 2001, Proceedings International Conference on Shape Modeling and Applications.

[85]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[86]  S. Benhamou How to reliably estimate the tortuosity of an animal's path: straightness, sinuosity, or fractal dimension? , 2004, Journal of theoretical biology.

[87]  Alexander Zipf,et al.  A polygon-based approach for matching OpenStreetMap road networks with regional transit authority data , 2016, Int. J. Geogr. Inf. Sci..

[88]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[89]  Mario Mustra,et al.  A Survey of Methods and Technologies for Congestion Estimation Based on Multisource Data Fusion , 2021, Applied Sciences.

[90]  Mohan M. Trivedi,et al.  Trajectory Learning for Activity Understanding: Unsupervised, Multilevel, and Long-Term Adaptive Approach , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[91]  Xiaofeng Wang,et al.  Semantic trajectory-based event detection and event pattern mining , 2013, Knowledge and Information Systems.

[92]  Hui Xie,et al.  A Novel Spatiotemporal Data Model for River Water Quality Visualization and Analysis , 2019, IEEE Access.

[93]  M. Shahriar Hossain,et al.  Coordinated clustering algorithms to support charging infrastructure design for electric vehicles , 2012, UrbComp '12.

[94]  Manisha Naik Gaonkar,et al.  AutoEpsDBSCAN : DBSCAN with Eps Automatic for Large Dataset , 2013 .

[95]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[96]  Christophe Claramunt,et al.  Inferring geometric similarities of trajectories by an abstract trajectory descriptor , 2020 .

[97]  Daniel P. Ames,et al.  Quantitative Methods for Comparing Different Polyline Stream Network Models , 2014 .