Non-convex clustering using expectation maximization algorithm with rough set initialization

An integration of a minimal spanning tree (MST) based graph-theoretic technique and expectation maximization (EM) algorithm with rough set initialization is described for non-convex clustering. EM provides the statistical model of the data and handles the associated uncertainties. Rough set theory helps in faster convergence and avoidance of the local minima problem, thereby enhancing the performance of EM. MST helps in determining non-convex clusters. Since it is applied on Gaussians rather than the original data points, time required is very low. These features are demonstrated on real life datasets. Comparison with related methods is made in terms of a cluster quality measure and computation time.

[1]  Andrzej Skowron,et al.  The Discernibility Matrices and Functions in Information Systems , 1992, Intelligent Decision Support.

[2]  Andrzej Skowron,et al.  Rough Sets: A Tutorial , 1998 .

[3]  Marcin Szczuka Rough Sets and Artificial Neural Networks , 1998 .

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  S. Pal,et al.  Segmentation of remotely sensed images with fuzzy thresholding, and quantitative evaluation , 2000 .

[6]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[7]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[8]  Sankar K. Pal,et al.  Neuro-Fuzzy Pattern Recognition: Methods in Soft Computing , 1999 .

[9]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[10]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .

[11]  Sankar K. Pal,et al.  Multilayer perceptron, fuzzy sets, and classification , 1992, IEEE Trans. Neural Networks.

[12]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[13]  Sankar K. Pal,et al.  Rough fuzzy MLP: knowledge encoding and classification , 1998, IEEE Trans. Neural Networks.

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  Andrzej Skowron,et al.  Rough Sets in Knowledge Discovery 2: Applications, Case Studies, and Software Systems , 1998 .

[16]  R. Słowiński Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory , 1992 .

[17]  Marina Meila,et al.  An Experimental Comparison of Several Clustering and Initialization Methods , 1998, UAI.

[18]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[19]  J. Simonoff Multivariate Density Estimation , 1996 .