Comparative Study of K-means and Fuzzy C-means Algorithms on The Breast Cancer Data

Breast cancer is one of the most common forms of cancer having a worldwide prevalence. Continuous research is going on for detecting breast cancer in its early stage as the possibility of cure is very high in the early stage. The two main objectives of this work were: firstly, to compare the performance of k-means and fuzzy c-means (FCM) clustering algorithms; and  secondly, to make an attempt to carefully consider and examine, from multiple points of view, the combination of different computational measures for k-means and FCM algorithms for a potential to achieve better clustering accuracy. K-means and FCM algorithms have been considered to understand the impact of clustering on the breast cancer data. The execution of k-means algorithm is based on centroid, distance, split method, threshold, epoch, BCW attribute, and number of iterations; while FCM is executed on the basis of fuzziness value and termination condition. The breast cancer Wisconsin (BCW) dataset was used for the experimentation. The combination of variance and same centroid offers better outcome in terms of k-means algorithm. The highest and lowest classification accuracies are (94.7%, 77.1 %) and (94.4%, 88.5%) for foggy and random centroid, respectively. The overall average positive prediction accuracy obtained by this approach is approximately 92%. In case of FCM, the highest and lowest classification accuracies are (97.2%, 91.1 %), (97.2%, 90.9%), (97.8%, 90.4%), and (97.1%, 90.2%) for different combination of fuzziness and termination criteria. The average highest and lowest classification accuracies are (95.7%, 94.7 %), (95.9%, 93.6%), (95.3%, 94.2%), and (95.6%, 93.7%) for the same combination in the case of FCM. K-means algorithm was more prominent and consistent in terms of computation time as FCM required more time to carry out several fuzzy calculations and iterations. The findings of this work provide an incisive and extensive understanding of the computational parameters used with k-means and c-means algorithms. The computational results indicate that FCM algorithm was found to be prominent and consistent than k-means algorithm when executed with different iterations, fuzziness values, and termination criteria. It is more potentially capable in classifying BCW dataset as the classification accuracy is more important than time.

[1]  Nooraini Yusoff,et al.  Classifying breast cancer types based on fine needle aspiration biopsy data using random forest classifier , 2013, 2013 13th International Conference on Intellient Systems Design and Applications.

[2]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[3]  J. Bezdek A Physical Interpretation of Fuzzy ISODATA , 1993 .

[4]  Zarita Zainuddin,et al.  An effective fuzzy C-means algorithm based on symmetry similarity approach , 2015, Appl. Soft Comput..

[5]  Jing Lu,et al.  Modelling of Cancer Patient Records: A Structured Approach to Data Mining and Visual Analytics , 2017, ITBAM.

[6]  Raghu Machiraju,et al.  Breast cancer patient stratification using a molecular regularized consensus clustering method. , 2014, Methods.

[7]  M. H. Shaheed,et al.  Cancer classification using clustering based gene selection and artificial neural networks , 2011, The 2nd International Conference on Control, Instrumentation and Automation.

[8]  Vikas Verma,et al.  IMPROVED K-MEANS CLUSTERING ALGORITHM USING BACK PROPAGATION METHOD , 2016 .

[9]  P Festa,et al.  A biased random-key genetic algorithm for data clustering. , 2013, Mathematical biosciences.

[10]  Chien-Hsing Chen,et al.  A hybrid intelligent model of analyzing clinical breast cancer data using clustering techniques with feature selection , 2014, Appl. Soft Comput..

[11]  S. Duffy,et al.  Critical research gaps and translational priorities for the successful prevention and treatment of breast cancer , 2013, Breast Cancer Research.

[12]  C. Mathers,et al.  Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008 , 2010, International journal of cancer.

[13]  Zuherman Rustam,et al.  Cancer classification using Fuzzy C-Means with feature selection , 2016, 2016 12th International Conference on Mathematics, Statistics, and Their Applications (ICMSA).

[14]  Sabu M. Thampi,et al.  Predicting cancer subtypes from microarray data using semi-supervised fuzzy C-means algorithm , 2017, J. Intell. Fuzzy Syst..

[15]  Sonal Jain,et al.  Analysis of k-means clustering approach on the breast cancer Wisconsin dataset , 2016, International Journal of Computer Assisted Radiology and Surgery.

[16]  Sonal Jain,et al.  A Survey on Breast Cancer Scenario and Prediction Strategy , 2014, FICTA.

[17]  Sanjay Kumar Dubey,et al.  Comparative Analysis of K-Means and Fuzzy C- Means Algorithms , 2013 .

[18]  Sang Won Yoon,et al.  Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms , 2014, Expert Syst. Appl..

[19]  Shanlin Yang,et al.  Fuzziness parameter selection in fuzzy c-means: The perspective of cluster validation , 2014, Science China Information Sciences.

[20]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[21]  Qiyong Guo,et al.  Comparison of K-Means and Fuzzy c-Means Algorithm Performance for Automated Determination of the Arterial Input Function , 2014, PloS one.

[22]  Sonal Jain,et al.  Breast cancer statistics and prediction methodology: a systematic review and analysis. , 2015, Asian Pacific journal of cancer prevention : APJCP.

[23]  Shengrui Wang,et al.  A novel hierarchical clustering algorithm for gene sequences , 2012, BMC Bioinformatics.

[24]  S. V. KASMIR RAJA,et al.  REFINEMENT OF CLUSTERS FROM K-MEANS WITH ANT COLONY OPTIMIZATION , 2009 .

[25]  D. Vanisri,et al.  Fuzzy pattern cluster scheme for breast cancer datasets , 2010, 2010 International Conference on Communication and Computational Intelligence (INCOCCI).