Survey on Classification and Feature Selection Approaches for Disease Diagnosis

Patient case similarity implies that finding and extracting a patient case have similar features in the knowledge base. The knowledge base contains data obtained through demographics, progress notes, medications, past medical history, discharge summaries and lab values. Data pre-processing is the first step and an important step in the modelling process. The aim of this step is to increase the effectiveness of the classification process by using representative and consistent data set. Pre-processing includes data cleaning, data transformation and feature selection. Further, for predicting the new cases, new sample will be submitted to trained model. In the literature, various feature selection and classification approaches are available, but it is not clear which feature selection approach may have better classification performance. So, this study presents a survey on feature selection and classification approaches applied on seven benched-marked diseases data sets obtained from the UCI repository.

[1]  Giuseppe Tradigo,et al.  On the Analysis of Diseases and Their Related Geographical Data , 2017, IEEE Journal of Biomedical and Health Informatics.

[2]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[3]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[4]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[5]  Haijia Shi Best-first Decision Tree Learning , 2007 .

[6]  Xiaoli Wang,et al.  Automatic Diagnosis With Efficient Medical Case Searching Based on Evolving Graphs , 2018, IEEE Access.

[7]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[8]  Damodar Reddy Edla,et al.  A novel hybrid credit scoring model based on ensemble feature selection and multilayer ensemble classification , 2019, Comput. Intell..

[9]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[10]  Ian H. Witten,et al.  Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques , 2016 .

[11]  Lior Rokach,et al.  Data Mining with Decision Trees - Theory and Applications , 2007, Series in Machine Perception and Artificial Intelligence.

[12]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Damodar Reddy Edla,et al.  Hybrid credit scoring model using neighborhood rough set and multi-layer ensemble classification , 2018, J. Intell. Fuzzy Syst..

[15]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[16]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[17]  Damodar Reddy Edla,et al.  An Efficient Multi-layer Ensemble Framework with BPSOGSA-Based Feature Selection for Credit Scoring Data Analysis , 2018 .

[18]  Ramalingaswamy Cheruku,et al.  Relative Performance Evaluation of Ensemble Classification with Feature Reduction in Credit Scoring Datasets , 2018 .

[19]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[20]  Ian H. Witten,et al.  Chapter 10 – Deep learning , 2017 .