Detecting feature from spatial point processes using Collective Nearest Neighbor

In a spatial point set, clustering patterns (features) are difficult to locate due to the presence of noise. Previous methods, either using grid-based method or distance-based method to separate feature from noise, suffer from the parameter choice problem, which may produce different point patterns in terms of shape and area. This paper presents the Collective Nearest Neighbor method (CLNN) to identify features. CLNN assumes that in spatial data clustered points and noise can be viewed as two homogenous point processes. The one with the higher intensity is considered as a feature and the one with the lower intensity is treated as noise. As a result, they can be separated according to the difference in intensity between them. With CLNN, points are first classified into feature and noise based on the kth nearest distance (the distance between a point and its kth nearest neighbor) at various values of k. Then, CLNN selects those classifications in which the separated classes (i.e. features and noise) are homogenous Poisson processes and cannot be further divided. Finally, CLNN identifies clustered points by averaging the selected classifications. Evaluation of CLNN using simulated data shows that CLNN reduces the number of false points significantly. The comparison between CLNN, the shared nearest neighbor, the spatial scan and the classification entropy method shows that CLNN produced the fewest false points. A case study using seismic data in southwestern China showed that CLNN is able to identify foreshocks of the Songpan earthquake (M = 7.2), which may help to locate the epicenter of the Songpan earthquake.

[1]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[2]  Paul A. Reasenberg,et al.  Foreshock Occurrence Rates before Large Earthquakes Worldwide , 1999 .

[3]  Peter A. Rogerson,et al.  A Statistical Method for the Detection of Geographic Clustering , 2010 .

[4]  Gonzalo López-Abente,et al.  Association between health information, use of protective devices and occurrence of acute health problems in the Prestige oil spill clean-up in Asturias and Cantabria (Spain): a cross-sectional study , 2006, BMC public health.

[5]  Chin-Chen Chang,et al.  A New Density-Based Scheme for Clustering Based on Genetic Algorithm , 2005, Fundam. Informaticae.

[6]  Levent Ertoz,et al.  A New Shared Nearest Neighbor Clustering Algorithm and its Applications , 2002 .

[7]  R. Jarvis,et al.  ClusteringUsing a Similarity Measure Based on SharedNear Neighbors , 1973 .

[8]  A. Raftery,et al.  Detecting features in spatial point processes with clutter via model-based clustering , 1998 .

[9]  S Openshaw Methods for investigating localized clustering of disease. Using a geographical analysis machine to detect the presence of spatial clustering and the location of clusters in synthetic data. , 1996, IARC scientific publications.

[10]  Akira Hasegawa,et al.  Foreshock and Aftershock Sequence of the 1998 M 5.0 Sendai, Northeastern Japan, Earthquake and Its Implications for Earthquake Nucleation , 2002 .

[11]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[12]  Tae Young Yang,et al.  Bayesian nearest-neighbor analysis via record value statistics and nonhomogeneous spatial Poisson processes , 2007, Comput. Stat. Data Anal..

[13]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[14]  Y. Chen,et al.  Pattern Characteristics of Foreshock Sequences , 1999 .

[15]  Joseph Anthony Navarro,et al.  STUDIES IN STATISTICAL ECOLOGY , 1955 .

[16]  Paulo Sérgio Lucio,et al.  Detecting Randomness in Spatial Point Patterns: A “Stat-Geometrical” Alternative , 2004 .

[17]  Andrew B. Lawson,et al.  Statistical Methods in Spatial Epidemiology , 2001 .

[18]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[19]  A. Raftery,et al.  Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes , 1998 .

[20]  Jean Gaudart,et al.  Space-time clustering of childhood malaria at the household level: a dynamic cohort in a Mali village , 2006, BMC public health.

[21]  Chenghu Zhou,et al.  A new approach to the nearest‐neighbour method to discover cluster features in overlaid spatial point processes , 2006, Int. J. Geogr. Inf. Sci..

[22]  Martin Charlton,et al.  A Mark 1 Geographical Analysis Machine for the automated analysis of point data sets , 1987, Int. J. Geogr. Inf. Sci..

[23]  M Kulldorff,et al.  Spatial disease clusters: detection and inference. , 1995, Statistics in medicine.

[24]  Swarup Roy,et al.  An Approach to Find Embedded Clusters Using Density Based Techniques , 2005, ICDCIT.

[25]  M. Kulldorff A spatial scan statistic , 1997 .

[26]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[27]  骆剑承,et al.  Multi—scale expression of spatial activity anomalies of earthquakes and its indicative significance on the space and time attributes of strong earthquakes , 2003 .

[28]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[29]  R. Matsu’ura,et al.  A Point-process Analysis of the Matsushiro Earthquake Swarm Sequence: The Effect of Water on Earthquake Occurrence , 2005 .

[30]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[31]  J. G. Skellam STUDIES IN STATISTICAL ECOLOGY SPATIAL PATTERN , 1952 .

[32]  Adrian E. Raftery,et al.  Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST , 2003, J. Classif..

[33]  Jean-Claude Thill,et al.  Local Indicators of Network-Constrained Clusters in Spatial Point Patterns , 2007 .

[34]  Martin Charlton,et al.  Point Pattern Analysis , 2007 .

[35]  B. Ripley Spatial Point Pattern Analysis in Ecology , 1987 .

[36]  Ping Yan,et al.  A cluster model for space–time disease counts , 2006, Statistics in medicine.

[37]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[38]  M. Kulldorff,et al.  A Space–Time Permutation Scan Statistic for Disease Outbreak Detection , 2005, PLoS medicine.

[39]  Yuzo Toya,et al.  Is Background Seismicity Produced at a Stationary Poissonian Rate , 2000 .

[40]  Chenghu Zhou,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[41]  Chung-Pai Chang,et al.  A study on the background and clustering seismicity in the Taiwan region by using point process models : Stress transfer, earthquake triggering, and time-dependent seismic hazard , 2005 .

[42]  A. Craft,et al.  INVESTIGATION OF LEUKAEMIA CLUSTERS BY USE OF A GEOGRAPHICAL ANALYSIS MACHINE , 1988, The Lancet.

[43]  Maurizio Ripepe,et al.  Foreshock sequence of September 26th, 1997 Umbria-Marche earthquakes , 2000 .

[44]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[45]  张 肇诚 中国震例 = Earthquake cases in China , 1988 .