Clustering spatial data with a hybrid EM approach

In spatial clustering, in addition to the object similarity in the normal attribute space, similarity in the spatial space needs to be considered and objects assigned to the same cluster should usually be close to one another in the spatial space. The conventional expectation maximization (EM) algorithm is not suited for spatial clustering because it does not consider spatial information. Although neighborhood EM (NEM) algorithm incorporates a spatial penalty term to the criterion function, it involves much more iterations in every E-step. In this paper, we propose a Hybrid EM (HEM) approach that combines EM and NEM. Its computational complexity for every pass is between EM and NEM. Experiments also show that its clustering quality is better than EM and comparable to NEM.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  R. Hathaway Another interpretation of the EM algorithm for mixture distributions , 1986 .

[3]  Jean-Paul Rasson,et al.  Multivariate Discriminant Analysis and Maximum Penalized Likelihood Density Estimation , 1995 .

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Otis W. Gilley,et al.  On the Harrison and Rubinfeld Data , 1996 .

[6]  Noel A. C. Cressie,et al.  Statistics for Spatial Data: Cressie/Statistics , 1993 .

[7]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[8]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[9]  Christoph Neukirchen,et al.  A continuous density interpretation of discrete HMM systems and MMI-neural networks , 2001, IEEE Trans. Speech Audio Process..

[10]  Robert J. McEliece,et al.  The Theory of Information and Coding , 1979 .

[11]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[12]  Ickjai Lee,et al.  Fast spatial clustering with different metrics and in the presence of obstacles , 2001, GIS '01.

[13]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[16]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[17]  Gérard Govaert,et al.  Convergence of an EM-type algorithm for spatial clustering , 1998, Pattern Recognit. Lett..

[18]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[19]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[20]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[21]  Anthony K. H. Tung,et al.  Spatial clustering in the presence of obstacles , 2001, Proceedings 17th International Conference on Data Engineering.

[22]  James P. LeSage Arc Mat , a Matlab toolbox for using ArcView Shape files for spatial econometrics and statistics , 2004 .

[23]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[24]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[25]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[26]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[27]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[28]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[29]  Pedro Larrañaga,et al.  An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering , 2000, Pattern Recognit. Lett..

[30]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[31]  Mark Gahegan,et al.  Opening the black box: interactive hierarchical clustering for multivariate spatial patterns , 2002, GIS '02.

[32]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .