A hybrid EM approach to spatial clustering

Spatial clustering requires consideration of spatial information and this makes expectation-maximization (EM) algorithm that maximizes likelihood alone inappropriate. Although neighborhood EM (NEM) algorithm incorporates a spatial penalty term, it needs much more iterations for E-step. To incorporate spatial information while avoiding much additional computation, we propose a hybrid EM (HEM) approach that combines EM and NEM. Early training is performed via a selective hard EM till the penalized likelihood criterion begins to decrease. Then training is turned to NEM, which runs only one iteration of E-step and plays a role of finer tuning. Thus spatial information is incorporated throughout HEM and the computational complexity is also comparable to EM. Empirical results show that a few more passes are needed in HEM to converge after switching to NEM and the final clustering quality is close to or slightly better than standard NEM.

[1]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Robert J. McEliece,et al.  The Theory of Information and Coding , 1979 .

[4]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[5]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[6]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[7]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[8]  Ickjai Lee,et al.  Fast spatial clustering with different metrics and in the presence of obstacles , 2001, GIS '01.

[9]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[10]  Mark Gahegan,et al.  Opening the black box: interactive hierarchical clustering for multivariate spatial patterns , 2002, GIS '02.

[11]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[12]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[14]  Anthony K. H. Tung,et al.  Spatial clustering in the presence of obstacles , 2001, Proceedings 17th International Conference on Data Engineering.

[15]  Charles B. Fleming,et al.  Opening the Black Box: Using Process Evaluation Measures to Assess Implementation and Theory Building , 1999, American journal of community psychology.

[16]  Christoph Neukirchen,et al.  A continuous density interpretation of discrete HMM systems and MMI-neural networks , 2001, IEEE Trans. Speech Audio Process..

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[19]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[20]  Hujun Yin,et al.  Self-organizing mixture networks for probability density estimation , 2001, IEEE Trans. Neural Networks.

[21]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[22]  Gérard Govaert,et al.  Convergence of an EM-type algorithm for spatial clustering , 1998, Pattern Recognit. Lett..

[23]  Noel A Cressie,et al.  Statistics for Spatial Data, Revised Edition. , 1994 .

[24]  Otis W. Gilley,et al.  On the Harrison and Rubinfeld Data , 1996 .

[25]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[26]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[27]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[28]  R. Hathaway Another interpretation of the EM algorithm for mixture distributions , 1986 .

[29]  Jean-Paul Rasson,et al.  Multivariate Discriminant Analysis and Maximum Penalized Likelihood Density Estimation , 1995 .