Windowed nearest neighbour method for mining spatio-temporal clusters in the presence of noise

In a spatio-temporal data set, identifying spatio-temporal clusters is difficult because of the coupling of time and space and the interference of noise. Previous methods employ either the window scanning technique or the spatio-temporal distance technique to identify spatio-temporal clusters. Although easily implemented, they suffer from the subjectivity in the choice of parameters for classification. In this article, we use the windowed kth nearest (WKN) distance (the geographic distance between an event and its kth geographical nearest neighbour among those events from which to the event the temporal distances are no larger than the half of a specified time window width [TWW]) to differentiate clusters from noise in spatio-temporal data. The windowed nearest neighbour (WNN) method is composed of four steps. The first is to construct a sequence of TWW factors, with which the WKN distances of events can be computed at different temporal scales. Second, the appropriate values of TWW (i.e. the appropriate temporal scales, at which the number of false positives may reach the lowest value when classifying the events) are indicated by the local maximum values of densities of identified clustered events, which are calculated over varying TWW by using the expectation-maximization algorithm. Third, the thresholds of the WKN distance for classification are then derived with the determined TWW. In the fourth step, clustered events identified at the determined TWW are connected into clusters according to their density connectivity in geographic–temporal space. Results of simulated data and a seismic case study showed that the WNN method is efficient in identifying spatio-temporal clusters. The novelty of WNN is that it can not only identify spatio-temporal clusters with arbitrary shapes and different spatio-temporal densities but also significantly reduce the subjectivity in the classification process.

[1]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[2]  M. Kulldorff,et al.  Dead Bird Clusters as an Early Warning System for West Nile Virus Activity , 2003, Emerging infectious diseases.

[3]  A C Gatrell,et al.  Modelling exposure opportunities: estimating relative risk for motor neurone disease in Finland. , 2000, Social science & medicine.

[4]  张 肇诚 中国震例 = Earthquake cases in China , 1988 .

[5]  K. Dixon,et al.  Using geographic information systems and spatial and space-time scan statistics for a population-based risk analysis of the 2002 equine West Nile epidemic in six contiguous regions of Texas , 2007, International journal of health geographics.

[6]  G. Jacquez A k nearest neighbour test for space-time interaction. , 1996, Statistics in medicine.

[7]  Chenghu Zhou,et al.  A new approach to the nearest‐neighbour method to discover cluster features in overlaid spatial point processes , 2006, Int. J. Geogr. Inf. Sci..

[8]  Maurizio Ripepe,et al.  Foreshock sequence of September 26th, 1997 Umbria-Marche earthquakes , 2000 .

[9]  Y. Chen,et al.  Pattern Characteristics of Foreshock Sequences , 1999 .

[10]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[11]  Jean Gaudart,et al.  Space-time clustering of childhood malaria at the household level: a dynamic cohort in a Mali village , 2006, BMC public health.

[12]  E G Knox,et al.  The Detection of Space‐Time Interactions , 1964 .

[13]  Richard G. Cornell Statistical Methods for Cancer Studies , 1984 .

[14]  Jiangshe Zhang,et al.  Multi-scale expression of spatial activity anomalies of earthquakes and its indicative significance on the space and time attributes of strong earthquakes , 2003 .

[15]  Jerry H. Ratcliffe Detecting Spatial Movement of Intra-Region Crime Patterns Over Time , 2005 .

[16]  Shane D. Johnson,et al.  The Stability of Space-Time Clusters of Burglary , 2004 .

[17]  A. Raftery,et al.  Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes , 1998 .

[18]  Gonzalo López-Abente,et al.  Association between health information, use of protective devices and occurrence of acute health problems in the Prestige oil spill clean-up in Asturias and Cantabria (Spain): a cross-sectional study , 2006, BMC public health.

[19]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[20]  Jiawei Han,et al.  Spatial clustering methods in data mining , 2001 .

[21]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[22]  Anthony K. H. Tung,et al.  Spatial clustering methods in data mining : A survey , 2001 .

[23]  M Kulldorff,et al.  Spatial disease clusters: detection and inference. , 1995, Statistics in medicine.

[24]  Chenghu Zhou,et al.  DECODE: a new method for discovering clusters of different densities in spatial data , 2009, Data Mining and Knowledge Discovery.

[25]  Elizabeth A. Mack,et al.  Spatio-Temporal Interaction of Urban Crime , 2008 .

[26]  M. Kulldorff,et al.  A Space–Time Permutation Scan Statistic for Disease Outbreak Detection , 2005, PLoS medicine.

[27]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[28]  Chenghu Zhou,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[29]  M. Kulldorff,et al.  The Knox Method and Other Tests for Space‐Time Interaction , 1999, Biometrics.

[30]  Yuzo Toya,et al.  Is Background Seismicity Produced at a Stationary Poissonian Rate , 2000 .

[31]  Min Wang,et al.  Mining Spatial-temporal Clusters from Geo-databases , 2006, ADMA.

[32]  骆剑承,et al.  Multi—scale expression of spatial activity anomalies of earthquakes and its indicative significance on the space and time attributes of strong earthquakes , 2003 .

[33]  P. Diggle,et al.  Non-parametric estimation of spatial variation in relative risk. , 1995, Statistics in medicine.

[34]  Ronald E Gangnon Impact of prior choice on local Bayes factors for cluster detection. , 2006, Statistics in medicine.

[35]  Ilya Zaliapin,et al.  Clustering analysis of seismicity and aftershock identification. , 2007, Physical review letters.

[36]  Ping Yan,et al.  A cluster model for space–time disease counts , 2006, Statistics in medicine.

[37]  Lucy Bastin,et al.  Spatial aspects of MRSA epidemiology: a case study using stochastic simulation, kernel estimation and SaTScan , 2007, Int. J. Geogr. Inf. Sci..

[38]  M. Dwass Modified Randomization Tests for Nonparametric Hypotheses , 1957 .

[39]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[40]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.