A Divise Initialisation Method for Clustering Algorithms

A method for the initialisation step of clustering algorithms is presented. It is based on the concept of cluster as a high density region of points. The search space is modelled as a set of d-dimensional cells. A sample of points is chosen and located into the appropriate cells. Cells are iteratively split as the number of points they receive increases. The regions of the search space having a higher density of points are considered good candidates to contain the true centers of the clusters. Preliminary experimental results show the good quality of the estimated centroids with respect to the random choice of points. The accuracy of the clusters obtained by running the K-Means algorithm with the two different initialisation techniques – random starting centers chosen uniformly on the datasets and centers found by our method – is evaluated and the better outcome of the K-Means by using our initialisation method is shown.

[1]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[2]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[3]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[4]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[5]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[6]  Brian Everitt,et al.  Cluster analysis , 1974 .

[7]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[8]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[9]  Sanjay Ranka,et al.  An effic ient k-means clustering algorithm , 1997 .

[10]  Anil K. Jain,et al.  Large-scale parallel data clustering , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[11]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[12]  Paul S. Bradley,et al.  Clustering via Concave Minimization , 1996, NIPS.

[13]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[14]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[15]  Wendy R. Fox,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1991 .

[16]  R. Ng,et al.  Eecient and Eeective Clustering Methods for Spatial Data Mining , 1994 .