Opening the black box: interactive hierarchical clustering for multivariate spatial patterns

Clustering is one of the most important tasks for geographic knowledge discovery. However, existing clustering methods have two severe drawbacks for this purpose. First, spatial clustering methods have so far been mainly focused on searching for patterns within the spatial dimensions (usually 2D or 3D space), while more general-purpose high-dimensional (multivariate) clustering methods have very limited power in recognizing spatial patterns that involve neighbors. Secondly, existing clustering methods tend to be 'closed' and are not geared toward allowing the interaction needed to effectively support a human-led exploratory analysis. The contribution of the research includes three parts. (1) Develop an effective and efficient hierarchical spatial clustering method, which can generate a 1-D spatial cluster ordering that preserves all the hierarchical clusters. (2) Develop a density- and grid-based hierarchical subspace clustering method to effectively identify high-dimensional clusters. The spatial cluster ordering is then integrated with this subspace clustering method to effectively search multivariate spatial patterns. (3) The above two methods are implemented in a fully open and interactive manner and supported by various visualization techniques. This opens up the "black box" of the clustering process for easy understanding, steering, focusing and interpretation. At the end a working demo with US census data is presented.

[1]  Daniel A. Keim,et al.  Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering , 1999, VLDB.

[2]  David S. L. Wei,et al.  Computer Algorithms , 1998, Scalable Comput. Pract. Exp..

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  T. M. Murali,et al.  A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[5]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[6]  Terry A. Slocum Thematic Cartography and Visualization , 1998 .

[7]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[8]  Martin Charlton,et al.  A Mark 1 Geographical Analysis Machine for the automated analysis of point data sets , 1987, Int. J. Geogr. Inf. Sci..

[9]  Ickjai Lee,et al.  AMOEBA: HIERARCHICAL CLUSTERING BASED ON SPATIAL PROXIMITY USING DELAUNATY DIAGRAM , 2000 .

[10]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[11]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[12]  David Harel,et al.  Clustering spatial data using random walks , 2001, KDD '01.

[13]  Leonidas J. Guibas,et al.  Primitives for the manipulation of general subdivisions and the computation of Voronoi diagrams , 1983, STOC.

[14]  Yi Zhang,et al.  Entropy-based subspace clustering for mining numerical data , 1999, KDD '99.

[15]  Leonidas J. Guibas,et al.  Primitives for the manipulation of general subdivisions and the computation of Voronoi diagrams , 1983, STOC.

[16]  Anthony K. H. Tung,et al.  Spatial clustering in the presence of obstacles , 2001, Proceedings 17th International Conference on Data Engineering.