Optimization of K-NN algorithm by clustering and reliability coefficients: application to breast-cancer diagnosis

Abstract There is a growing trend towards data mining applications in medicine. Different algorithms have been explored by medical practitioners in an attempt to assist their work; the diagnosis of breast cancer is one of those applications. Machine learning algorithms are of vital importance to many medical problems, they can help to diagnose a disease, to detect its causes, to predict the outcome of a treatment, etc. K-Nearest Neighbors algorithm (KNN) is one of the simplest algorithms; it is widely used in predictive analysis. To optimize its performance and to accelerate its process, this paper proposes a new solution to speed up KNN algorithm based on clustering and attributes filtering. It also includes another improvement based on reliability coefficients which insures a more accurate classification. Thus, the contributions of this paper are three-fold: (i) the clustering of class instances, (ii) the selection of most significant attributes, and (iii) the ponderation of similarities by reliability coefficients. Results of the proposed approach exceeded most known classification techniques with an average f-measure exceeding 94% on the considered breast-cancer Dataset.

[1]  Oscar Déniz-Suárez,et al.  Automatic breast parenchymal density classification integrated into a CADe system , 2011, International Journal of Computer Assisted Radiology and Surgery.

[2]  Hussein A. Abbass,et al.  An evolutionary artificial neural networks approach for breast cancer diagnosis , 2002, Artif. Intell. Medicine.

[3]  Shixiong Xia,et al.  An Improved KNN Text Classification Algorithm Based on Clustering , 2009, J. Comput..

[4]  Ivančica Mirošević k-means Algorithm , 2017 .

[5]  Moshe Sipper,et al.  A fuzzy-genetic approach to breast cancer diagnosis , 1999, Artif. Intell. Medicine.

[6]  Rudy Setiono,et al.  Extracting rules from pruned networks for breast cancer diagnosis , 1996, Artif. Intell. Medicine.

[7]  Soni Jyoti,et al.  Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction , 2011 .

[8]  Wen-Jyi Hwang,et al.  Fast kNN classification algorithm based on partial distance search , 1998 .

[9]  Ming-Yang Su,et al.  Using clustering to improve the KNN-based classifiers for online anomaly network traffic identification , 2011, J. Netw. Comput. Appl..

[10]  Rozita Jamili Oskouei,et al.  Data mining and medical world: breast cancers' diagnosis, treatment, prognosis and challenges. , 2017, American journal of cancer research.

[11]  Goreti Marreiros,et al.  Applying Data Mining Techniques to Improve Breast Cancer Diagnosis , 2016, Journal of Medical Systems.

[12]  R. Chang,et al.  Data mining with decision trees for diagnosis of breast tumor in medical ultrasonic images , 2001, Breast Cancer Research and Treatment.