Comparison of AIS and fuzzy c-means clustering methods on the classification of breast cancer and diabetes datasets

Data reduction is an indispensable part of pattern classification processes in many cases. If the number of samples is excessive, sample reduction or data reduction algorithms can be used for an effective processing time and reliable successive results. Many methods have been used for data reduction. Fuzzy c-means is one of these methods and it is widely used in such applications as clustering algorithms. In this study, we applied a different clustering algorithm, an artificial immune system (AIS), for the data reduction process. We realized the performance evaluation experiments on the standard Chainlink and Iris datasets, while the main application was conducted using the Wisconsin Breast Cancer and Pima Indian datasets, which were taken from the University of California, Irvine Machine Learning Repository. For these datasets, the performance of the AIS in the data reduction process was compared with the fuzzy c-means clustering algorithm, in which a multilayer perceptron artificial neural network was used as a classifier after the data reduction processes. The obtained results show that the maximum classification accuracies were obtained as 73.96 % for the Pima Indian Diabetes dataset and 97.80% for the Wisconsin Breast Cancer dataset with the AIS and the compression rates were 80% and 40% for these results. For fuzzy c-means clustering, however, the aforementioned accuracies were obtained as 63% and 86.69% for the Pima Indian Diabetes and Wisconsin Breast Cancer datasets, respectively. Moreover, the compression rates for these results for fuzzy c-means were 90% and 70%. When the mean classification accuracy values over the experimented compression rates were taken into consideration, the AIS reached a mean classification accuracy of 70.07% for the Pima Indian Diabetes dataset, while 47.64% was obtained by fuzzy c-means for this dataset. For the Wisconsin Breast Cancer dataset, however, the mean classification accuracies of the AIS and fuzzy c-means methods were recorded as 94.90% and 75.43%, respectively.

[1]  Mehmed Kantardzic,et al.  Data-Mining Concepts , 2011 .

[2]  Waseem Ahmad,et al.  Population-Based Artificial Immune System Clustering Algorithm , 2011, ICARIS.

[3]  P. Mahanti,et al.  Data Clustering with Artificial Innate Immune System Adding Probabilistic Behaviour , 2011 .

[4]  Andries Petrus Engelbrecht,et al.  Clustering data in an uncertain environment using an artificial immune system , 2011, Pattern Recognit. Lett..

[5]  Hong Gu,et al.  A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data , 2010, Expert Syst. Appl..

[6]  Senhua Yu,et al.  Artificial Immune Systems: A Bibliography , 2010 .

[7]  Rong Qiao-mei,et al.  Implementation of Clustering Algorithm Using Artificial Immune System , 2009, 2009 First International Workshop on Database Technology and Applications.

[8]  Alex Alves Freitas,et al.  An Artificial Immune System for Clustering Amino Acids in the Context of Protein Function Classification , 2009, J. Math. Model. Algorithms.

[9]  Tao Liu,et al.  A New Clustering Algorithm Based on Artificial Immune System , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[10]  Xin Jin,et al.  FAISC: A Fuzzy Artificial Immune System Clustering Algorithm , 2007, Third International Conference on Natural Computation (ICNC 2007).

[11]  Chui-Yu Chiu,et al.  Cluster Analysis Based on Artificial Immune System and Ant Algorithm , 2007, Third International Conference on Natural Computation (ICNC 2007).

[12]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[13]  Jonathan Timmis,et al.  A resource limited artificial immune system for data analysis , 2001, Knowl. Based Syst..

[14]  P Dulyakarn,et al.  FUZZY C-MEANS CLUSTERING USING SPATIAL INFORMATION WITH APPLICATION TO REMOTE SENSING , 2001 .

[15]  Jonathan Timmis Artificial immune systems : a novel data analysis technique inspired by the immune network theory , 2000 .

[16]  Fernando José Von Zuben,et al.  An Evolutionary Immune Network for Data Clustering , 2000, SBRN.

[17]  G. Oster,et al.  Theoretical studies of clonal selection: minimal antibody repertoire size and reliability of self-non-self discrimination. , 1979, Journal of theoretical biology.