Performance study of K-nearest neighbor classifier and K-means clustering for predicting the diagnostic accuracy

The major challenge related to data management lies in healthcare sector due to increase in patients proportional to the population growth and change in lifestyle. The data analytics and big data are becoming trends to provide solution to all analytical problems that can be obtained by using machine learning techniques. Today, cancer is evolving as one of the major attention seeking phenomenon in developed as well as in developing countries that may lead to death if not diagnosed at the early stage. The late diagnosis, and hence delayed treatment increase the risk for the survival. Thus, early detection to improve the cancer outcome is very critical. This study is intended towards early diagnosis of cancer using more efficient analytical techniques. Moreover, accuracy plays an important role in prediction to improve the quality of care, thereby increasing the survival rate. For this study, the datasets are extracted from UCI Machine Learning Repository prepared by University of Wisconsin Hospitals. For the diagnosis and classification process, K Nearest Neighbor (KNN) classifier is applied with different values of K variable, introducing the process called KNN Clustering. Later the performance of KNN is compared with K-Means clustering on the same datasets.