IMPROVING CLASSIFICATION PERFORMANCE OF K-NEAREST NEIGHBOUR BY HYBRID CLUSTERING AND FEATURE SELECTION FOR NON-COMMUNICABLE DISEASE PREDICTION

Non-communicable Disease (NCDs) is the high mortality rate in worldwide likely diabetes mellitus, cardiovascular diseases, liver and cancers. NCDs prediction model have problems such as redundancy data, missing data, noisy class and irrelevant attribute. This paper proposes a novel NCDs prediction model to improve accuracy. Our model comprises k-means as clustering technique, Weight by SVM as feature selection technique and k-nearest neighbour as classifier technique. The result shows that k-means + weight by SVM + k-nn improved the classification accuracy on most of all NCDs dataset (accuracy; AUC), likely Pima Indian Dataset (96.82; 0.982), Breast Cancer Diagnosis Dataset (97.36; 0.997), Breast Cancer Biopsy Dataset (96.85; 0.994), Colon Cancer (99.41; 1.000), ECG (97.80; 1.000), Liver Disorder (97.97; 0.998).

[1]  Guido Dedene,et al.  A Comparison of State-of-The-Art Classification Techniques for Expert Automobile Insurance Claim Fraud Detection , 2002 .

[2]  Durga Toshniwal,et al.  Hybrid prediction model for Type-2 diabetic patients , 2010, Expert Syst. Appl..

[3]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[4]  Chee Peng Lim,et al.  A hybrid intelligent system for medical data classification , 2014, Expert Syst. Appl..

[5]  Ya-Ju Fan,et al.  Optimizing feature selection to improve medical diagnosis , 2010, Ann. Oper. Res..

[6]  Li-Yeh Chuang,et al.  A hybrid feature selection method for DNA microarray data , 2011, Comput. Biol. Medicine.

[7]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[8]  Michael Kirley,et al.  An enhanced XCS rule discovery module using feature ranking , 2013, Int. J. Mach. Learn. Cybern..

[9]  Pasi Luukka,et al.  Feature selection using fuzzy entropy measures with similarity classifier , 2011, Expert Syst. Appl..

[10]  Nagamma Patil,et al.  Genetic algorithm based wrapper feature selection on hybrid prediction model for analysis of high dimensional data , 2014, 2014 9th International Conference on Industrial and Information Systems (ICIIS).

[11]  Mohsen Beheshti,et al.  Diabetes Data Analysis and Prediction Model Discovery Using RapidMiner , 2008, 2008 Second International Conference on Future Generation Communication and Networking.

[12]  Amine Chikh,et al.  Design of fuzzy classifier for diabetes disease using Modified Artificial Bee Colony algorithm , 2013, Comput. Methods Programs Biomed..

[13]  Ya-Ju Fan,et al.  On the Time Series $K$-Nearest Neighbor Classification of Abnormal Brain Activity , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[14]  Nihat Yilmaz,et al.  A New Data Preparation Method Based on Clustering Algorithms for Diagnosis Systems of Heart and Diabetes Diseases , 2014, Journal of Medical Systems.

[15]  Mohd Khanapi Abd Ghani,et al.  Trend of Case Based Reasoning in Diagnosing Chronic Disease: A Review , 2014 .

[16]  T. Aruldoss Albert Victoire,et al.  Design of fuzzy expert system for microarray data classification using a novel Genetic Swarm Algorithm , 2012, Expert Syst. Appl..

[17]  Markus Hofmann,et al.  RapidMiner: Data Mining Use Cases and Business Analytics Applications , 2013 .

[18]  Chih-Jen Lin,et al.  Feature Ranking Using Linear SVM , 2008, WCCI Causation and Prediction Challenge.

[19]  Amine Chikh,et al.  Diagnosis of Diabetes Diseases Using an Artificial Immune Recognition System2 (AIRS2) with Fuzzy K-nearest Neighbor , 2012, Journal of Medical Systems.

[20]  Nilmini Wickramasinghe,et al.  Critical analysis of the usage of patient demographic and clinical records during doctor-patient consultations: a Malaysian perspective , 2010 .

[21]  Der-Chiang Li,et al.  A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets , 2011, Artif. Intell. Medicine.

[22]  Hannu Toivonen,et al.  Data Mining In Bioinformatics , 2005 .

[23]  Qing Xie,et al.  An improved early detection method of type-2 diabetes mellitus using multiple classifier system , 2015, Inf. Sci..

[24]  Paulo J. G. Lisboa,et al.  White box radial basis function classifiers with component selection for clinical prediction models , 2014, Artif. Intell. Medicine.

[25]  Smaranda Belciug,et al.  Error-correction learning for artificial neural networks using the Bayesian paradigm. Application to automated medical diagnosis , 2014, J. Biomed. Informatics.

[26]  Sung C. Choi,et al.  Choice of the smoothing parameter and efficiency of k-nearest neighbor classification , 1986 .

[27]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[28]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[29]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[30]  Emre Gürbüz,et al.  A new adaptive support vector machine for diagnosis of diseases , 2014, Expert Syst. J. Knowl. Eng..

[31]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[32]  Verónica Bolón-Canedo,et al.  Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset , 2011, Expert Syst. Appl..

[33]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.