Multi-Level Clustering and its Visualization for Exploratory Spatial Analysis

Exploratory spatial analysis is increasingly necessary as larger spatial data is managed in electro-magnetic media. We propose an exploratory method that reveals a robust clustering hierarchy from 2-D point data. Our approach uses the Delaunay diagram to incorporate spatial proximity. It does not require prior knowledge about the data set, nor does it require preconditions. Multi-level clusters are successfully discovered by this new method in only O(nlogn) time, where n is the size of the data set. The efficiency of our method allows us to construct and display a new type of tree graph that facilitates understanding of the complex hierarchy of clusters. We show that clustering methods adopting a raster-like or vector-like representation of proximity are not appropriate for spatial clustering. We conduct an experimental evaluation with synthetic data sets as well as real data sets to illustrate the robustness of our method.

[1]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[2]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[3]  Christos Levcopoulos,et al.  The First Subquadratic Algorithm for Complete Linkage Clustering , 1995, ISAAC.

[4]  R. E. Miles On the homogeneous planar Poisson point process , 1970 .

[5]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .

[6]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[7]  Giuseppe Liotta,et al.  Low Degree Algorithms for Computing and Checking Gabriel Graphs. (Extended Abstract). , 1996 .

[8]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[9]  P. Burrough Principles of Geographical Information Systems for Land Resources Assessment , 1986 .

[10]  B. Boots,et al.  Edge length properties of random Voronoi polygons , 1987 .

[11]  A. Raftery,et al.  Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes , 1998 .

[12]  Matthew Dickerson,et al.  Simple algorithms for enumerating interpoint distances and finding $k$ nearest neighbors , 1992, Int. J. Comput. Geom. Appl..

[13]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[14]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[15]  Ki-Joune Li,et al.  A spatial data mining method by Delaunay triangulation , 1997, GIS '97.

[16]  Christopher M. Gold,et al.  The Meaning of "Neighbour" , 1992, Spatio-Temporal Reasoning.

[17]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[18]  C. Fraley,et al.  Nonparametric Maximum Likelihood Estimation of Features in Spatial Point Processes Using Voronoï Tessellation , 1997 .

[19]  Michael E. Houle,et al.  Robust Clustering of Large Geo-referenced Data Sets , 1999, PAKDD.

[20]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[21]  Christos Levcopoulos,et al.  Computing Hierarchies of Clusters from the Euclidean Minimum Spanning Tree in Linear Time , 1995, FSTTCS.

[22]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[23]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[24]  Jiong Yang,et al.  STING+: an approach to active spatial data mining , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[25]  C. Gold Problems with handling spatial data ― the Voronoi approach , 1991 .

[26]  Stan Openshaw,et al.  Two exploratory space-time-attribute pattern analysers relevant to GIS , 1994 .

[27]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[28]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[29]  Fionn Murtagh,et al.  Comments on 'Parallel Algorithms for Hierarchical Clustering and Cluster Validity' , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  O. Richmond,et al.  Quantitative characterization of second-phase populations , 1985 .

[31]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[32]  Martin Charlton,et al.  A Mark 1 Geographical Analysis Machine for the automated analysis of point data sets , 1987, Int. J. Geogr. Inf. Sci..

[33]  M. Aldenderfer Cluster Analysis , 1984 .