Permutation-test-based clustering method for detection of dynamic patterns in Spatio-temporal datasets

Abstract Massive spatio-temporal data have been collected from the earth observation systems for monitoring the changes of natural resources and environment. To find the interesting dynamic patterns embedded in spatio-temporal data, there is an urgent need for detecting spatio-temporal clusters formed by objects with similar attribute values occurring together across space and time. Among different clustering methods, the density-based methods are widely used to detect such spatio-temporal clusters because they are effective for finding arbitrarily shaped clusters and rely on less priori knowledge (e.g. the cluster number). However, a series of user-specified parameters is required to identify high-density objects and to determine cluster significance. In practice, it is difficult for users to determine the optimal clustering parameters; therefore, existing density-based clustering methods typically exhibit unstable performance. To overcome these limitations, a novel density-based spatio-temporal clustering method based on permutation tests is developed in this paper. High-density objects and cluster significance are determined based on statistical information on the dataset. First, the density of each object is defined based on the local variance and a fast permutation test is conducted to identify high-density objects. Then, a proposed two-stage grouping strategy is implemented to group high-density objects and their neighbors; hence, spatio-temporal clusters are formed by minimizing the inhomogeneity increase. Finally, another newly developed permutation test is conducted to evaluate the cluster significance based on the cluster member permutation. Experiments on both simulated and meteorological datasets show that the proposed method exhibits superior performance to two state-of-the-art clustering methods, i.e., ST-DBSCAN and ST-OPTICS. The proposed method can not only identify inherent cluster patterns in spatio-temporal datasets, but also greatly alleviates the difficulty in selecting appropriate clustering parameters.

[1]  Kevin M. Curtin,et al.  Evaluating the spatiotemporal clustering of traffic incidents , 2013, Comput. Environ. Urban Syst..

[2]  Chenghu Zhou,et al.  Please Scroll down for Article International Journal of Geographical Information Science Windowed Nearest Neighbour Method for Mining Spatio-temporal Clusters in the Presence of Noise Windowed Nearest Neighbour Method for Mining Spatio-temporal Clusters in the Presence of Noise , 2022 .

[3]  M. Kulldorff,et al.  A Space–Time Permutation Scan Statistic for Disease Outbreak Detection , 2005, PLoS medicine.

[4]  Lueder von Bremen,et al.  CorClustST - Correlation-based clustering of big spatio-temporal datasets , 2020, Future Gener. Comput. Syst..

[5]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[6]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[7]  Maribel Yasmina Santos,et al.  4D+SNN: A Spatio-Temporal Density-Based Clustering Approach with 4D Similarity , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[8]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[9]  George J. Vachtsevanos,et al.  "Seismic-mass" density-based algorithm for spatio-temporal clustering , 2013, Expert Syst. Appl..

[10]  Slava Kisilevich,et al.  Spatio-temporal clustering , 2010, Data Mining and Knowledge Discovery Handbook.

[11]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[12]  Maguelonne Teisseire,et al.  Detection of spatio-temporal evolutions on multi-annual satellite image time series: A clustering based approach , 2019, Int. J. Appl. Earth Obs. Geoinformation.

[13]  Zhilin Li,et al.  A Multiscale Approach for Spatio‐Temporal Outlier Detection , 2006, Trans. GIS.

[14]  Hans-Peter Kriegel,et al.  Density‐based clustering , 2011, WIREs Data Mining Knowl. Discov..

[15]  J.-P. Benzécri,et al.  Rappel : Construction d'une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques , 1997 .

[16]  Anuj Karpatne,et al.  Spatio-Temporal Data Mining , 2017, ACM Comput. Surv..

[17]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[18]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[19]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[20]  Gemma C. Garriga,et al.  Permutation Tests for Studying Classifier Performance , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[21]  Li Bingyuan,et al.  A New Scheme for Climate Regionalization in China , 2010 .

[22]  M. Charlton,et al.  An Assessment of the Effectiveness of Multiple Hypothesis Testing for Geographical Anomaly Detection , 2011 .

[23]  Menno-Jan Kraak,et al.  Triclustering Georeferenced Time Series for Analyzing Patterns of Intra-Annual Variability in Temperature , 2018 .

[24]  Shashi Shekhar,et al.  Spatiotemporal Data Mining: A Computational Perspective , 2015, ISPRS Int. J. Geo Inf..

[25]  Feng Xu,et al.  Heterogeneous Space–Time Artificial Neural Networks for Space–Time Series Prediction , 2018, Trans. GIS.

[26]  Min Deng,et al.  A novel method for discovering spatio-temporal clusters of different sizes, shapes, and densities in the presence of noise , 2014, Int. J. Digit. Earth.

[27]  G. Yohe,et al.  A globally coherent fingerprint of climate change impacts across natural systems , 2003, Nature.

[28]  Fernando Bação,et al.  The self-organizing map, the Geo-SOM, and relevant variants for geosciences , 2005, Comput. Geosci..

[29]  Fionn Murtagh,et al.  Clustering in massive data sets , 2002 .

[30]  Toshiro Tango,et al.  International Journal of Health Geographics a Flexibly Shaped Space-time Scan Statistic for Disease Outbreak Detection and Monitoring , 2022 .

[31]  Leen-Kiat Soh,et al.  Spatio-temporal polygonal clustering with space and time as first-class citizens , 2013, GeoInformatica.

[32]  Min Deng,et al.  An adaptive method for clustering spatio‐temporal events , 2018, Trans. GIS.

[33]  Paolo Arcaini,et al.  User-driven geo-temporal density-based exploration of periodic and not periodic events reported in social networks , 2016, Inf. Sci..

[34]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[35]  Alexander Klippel,et al.  Analysing spatio-temporal autocorrelation with LISTA-Viz , 2010, Int. J. Geogr. Inf. Sci..

[36]  Maribel Yasmina Santos,et al.  Understanding the SNN Input Parameters and How They Affect the Clustering Results , 2015, Int. J. Data Warehous. Min..

[37]  Yan Shi,et al.  A density-based spatial clustering algorithm considering both spatial proximity and attribute similarity , 2012, Comput. Geosci..

[38]  Menno-Jan Kraak,et al.  A novel analysis of spring phenological patterns over Europe based on co‐clustering , 2016 .

[39]  Martin Kulldorff,et al.  Maximum linkage space-time permutation scan statistics for disease outbreak detection , 2014, International Journal of Health Geographics.

[40]  Brian J. Reich,et al.  Partially supervised spatiotemporal clustering for burglary crime series identification , 2015 .

[41]  Sanjay Garg,et al.  Development and validation of OPTICS based spatio-temporal clustering technique , 2016, Inf. Sci..

[42]  V. Estivill-Castro,et al.  Argument free clustering for large spatial point-data sets via boundary extraction from Delaunay Diagram , 2002 .

[43]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[44]  B. Singer,et al.  Controlling the False Discovery Rate: A New Application to Account for Multiple and Dependent Tests in Local Statistics of Spatial Association , 2006 .