Clustering the imbalanced datasets using modified Kohonen self-organizing map (KSOM)

The distribution of data plays an important role in determining the successfulness of learning process in machine learning. Data sets with imbalanced distribution may lead to biased results, especially in clustering. If the data is insufficient, the clustering will not be able to cluster and this will add randomness to the grouping. Therefore, the KSOM algorithm is modified to improve the clustering process. This modification is done based on the exploration and exploitation procedures in Ant Clustering Algorithm (ACA). To investigate the effectiveness of the modified algorithm, three imbalanced data sets are chosen; glass, Wisconsin diagnostic breast cancer and tropical wood data set. From the result, the modified KSOM has able to produce accurate number of clusters, reduce the number of overlapped cluster and slightly improve the percentage of accuracy.

[1]  H. S. Behera,et al.  Kohonen Self Organizing Map with Modified K-means clustering For High Dimensional Data Set , 2012 .

[2]  Sivakumar Ramakrishnan,et al.  A survey: hybrid evolutionary algorithms for cluster analysis , 2011, Artificial Intelligence Review.

[3]  T. Kohonen Analysis of processes and large data sets by a self-organizing method , 1999, Proceedings of the Second International Conference on Intelligent Processing and Manufacturing of Materials. IPMM'99 (Cat. No.99EX296).

[4]  Tommy W. S. Chow,et al.  Self-Organizing and Self-Evolving Neurons: A New Neural Network for Optimization , 2007, IEEE Transactions on Neural Networks.

[5]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[6]  Agostinho C. Rosa,et al.  KohonAnts - A Self-Organizing Ant Algorithm for Clustering and Pattern Classification , 2008, ALIFE.

[7]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[8]  Teuvo Kohonen Self-organizing maps of massive document collections , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[9]  Siti Mariyam Hj. Shamsuddin,et al.  Kohonen-Swarm Algorithm for Unstructured Data in Surface Reconstruction , 2008, 2008 Fifth International Conference on Computer Graphics, Imaging and Visualisation.

[10]  Edward J. Wegman,et al.  Huge Data Sets and the Frontiers of Computational Feasibility , 1995 .