Genetic algorithm wrapped Bayesian network feature selection applied to differential diagnosis of erythemato-squamous diseases

This paper presents a new method for differential diagnosis of erythemato-squamous diseases based on Genetic Algorithm (GA) wrapped Bayesian Network (BN) Feature Selection (FS). With this aim, a GA based FS algorithm combined in parallel with a BN classifier is proposed. Basically, erythemato-squamous dataset contains six dermatological diseases defined with 34 features. In GA-BN algorithm, GA makes a heuristic search to find most relevant feature model that increase accuracy of BN algorithm with the use of a 10-fold cross-validation strategy. The subsets of features are sequentially used to identify six dermatological diseases via a BN fitting the corresponding data. The algorithm, in this case, produces 99.20% classification accuracy in the diagnosis of erythemato-squamous diseases. The strength of feature model generated for BN is furthermore tested with the use of Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Simple Logistics (SL) and Functional Decision Tree (FT). The resultant classification accuracies of algorithms are 98.36%, 97.00%, 98.36% and 97.81% respectively. On the other hand, BN algorithm with classification accuracy of 99.20% is quite a high diagnosis performance for erythemato-squamous diseases. The proposed algorithm makes no more than 3 misclassifications out of 366 instances. Furthermore, FS power of GA is also compared with two alternative search algorithms, i.e. Best First (BF) and Sequential Floating (SF). The obtained results have all together shown that the proposed GA-BN based FS and prediction strategy is very promising in diagnosis of erythemato-squamous diseases.

[1]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[2]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[3]  Ian Witten,et al.  Data Mining , 2000 .

[4]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[5]  Burkhard Rost,et al.  Using genetic algorithms to select most predictive protein features , 2009, Proteins.

[6]  Hassan Zarabadipour,et al.  Automatic disease diagnosis systems using pattern recognition based genetic algorithm and neural networks , 2011 .

[7]  Elif Derya íbeyli Multiclass support vector machines for diagnosis of erythemato-squamous diseases , 2008 .

[8]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[9]  M Anbarasi,et al.  ENHANCED PREDICTION OF HEART DISEASE WITH FEATURE SUBSET SELECTION USING GENETIC ALGORITHM , 2010 .

[10]  Julio Caballero,et al.  2D Autocorrelation modeling of the negative inotropic activity of calcium entry blockers using Bayesian-regularized genetic neural networks. , 2006, Bioorganic & medicinal chemistry.

[11]  Arif Gülten,et al.  A Robust Multi-Class Feature Selection Strategy Based on Rotation Forest Ensemble Algorithm for Diagnosis of Erythemato-Squamous Diseases , 2012, Journal of Medical Systems.

[12]  Mohammad Darzi,et al.  Feature Selection for Breast Cancer Diagnosis: A Case-Based Wrapper Approach , 2011 .

[13]  Royston Goodacre,et al.  A genetic algorithm-Bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid identification of Bacillus spores and classification of Bacillus species , 2011, BMC Bioinformatics.

[14]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[15]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[17]  M. Cevdet Ince,et al.  A new feature selection method based on association rules for diagnosis of erythemato-squamous diseases , 2009, Expert Syst. Appl..

[18]  Yonghong Peng,et al.  A novel feature selection approach for biomedical data classification , 2010, J. Biomed. Informatics.

[19]  Loris Nanni,et al.  An ensemble of classifiers for the diagnosis of erythemato-squamous diseases , 2006, Neurocomputing.

[20]  Judea Pearl,et al.  The recovery of causal poly-trees from statistical data , 1987, Int. J. Approx. Reason..

[21]  Gabriel Vasile,et al.  Bayesian network model for diagnosis of psychiatric diseases , 2009, Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces.

[22]  Rina Dechter,et al.  Generalized best-first search strategies and the optimality of A* , 1985, JACM.

[23]  Z.-C. Li,et al.  Prediction of protein structure class by coupling improved genetic algorithm and support vector machine , 2008, Amino Acids.

[24]  Wilfried N. Gansterer,et al.  On the Relationship Between Feature Selection and Classification Accuracy , 2008, FSDM.

[25]  Georgios Dounias,et al.  Pap smear diagnosis using a hybrid intelligent scheme focusing on genetic algorithm based feature selection and nearest neighbor classification , 2009, Comput. Biol. Medicine.

[26]  H. Altay Güvenir,et al.  Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals , 1998, Artif. Intell. Medicine.

[27]  Bernhard Schölkopf,et al.  An Introduction to Support Vector Machines , 2003 .

[28]  Davar Giveki,et al.  Automatic detection of erythemato-squamous diseases using PSO-SVM based on association rules , 2013, Eng. Appl. Artif. Intell..

[29]  Laurence Anthony,et al.  An inductive learning method for medical diagnosis , 2003, Pattern Recognit. Lett..

[30]  João Gama,et al.  Functional Trees , 2001, Machine Learning.

[31]  Juanying Xie,et al.  Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases , 2011, Expert Syst. Appl..

[32]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[33]  H. A Güvenir,et al.  An expert system for the differential diagnosis of erythemato-squamous diseases , 2000 .

[34]  Kemal Polat,et al.  The effect to diagnostic accuracy of decision tree classifier of fuzzy and k-NN based weighted pre-processing methods to diagnosis of erythemato-squamous diseases , 2006, Digit. Signal Process..

[35]  Rubén Morales-Menéndez,et al.  Fault Diagnosis of Industrial Systems with Bayesian Networks and Neural Networks , 2008, MICAI.

[36]  Yongming Li,et al.  Research of multi-population agent genetic algorithm for feature selection , 2009, Expert Syst. Appl..

[37]  Yingtao Jiang,et al.  Selecting critical clinical features for heart diseases diagnosis with a real-coded genetic algorithm , 2008, Appl. Soft Comput..

[38]  Li-Yeh Chuang,et al.  IG-GA: A Hybrid Filter/Wrapper Method for Feature Selection of Microarray Data , 2010 .