Multi-Objective Evolutionary approach for the Performance Improvement of Learners using Ensembling Feature selection and Discretization Technique on Medical data

BACKGROUND Biomedical data is filled with continuous real values; these values in the feature set tend to create problems like underfitting, the curse of dimensionality and increase in misclassification rate because of higher variance. In response, pre-processing techniques on dataset minimizes the side effects and have shown success in maintaining the adequate accuracy. AIMS Feature selection and discretization are the two necessary preprocessing steps that were effectively employed to handle the data redundancies in the biomedical data. However, in the previous works, the absence of unified effort by integrating feature selection and discretization together in solving the data redundancy problem leads to the disjoint and fragmented field. This paper proposes a novel multi-objective based dimensionality reduction framework, which incorporates both discretization and feature reduction as an ensemble model for performing feature selection and discretization. Selection of optimal features and the categorization of discretized and non-discretized features from the feature subset is governed by the multi-objective genetic algorithm (NSGA-II). The two objectives, minimizing the error rate during the feature selection and maximizing the information gain, while discretization is considered as fitness criteria. METHODS The proposed model used wrapper-based feature selection algorithm to select the optimal features and categorized these selected features into two blocks namely discretized and nondiscretized blocks. The feature belongs to the discretized block will participate in the binary discretization while the second block features will not be discretized and used in its original form. RESULTS For the establishment and acceptability of the proposed ensemble model, the experiment is conducted on the fifteen medical datasets, and the metric such as accuracy, mean and standard deviation are computed for the performance evaluation of the classifiers. CONCLUSION After an extensive experiment conducted on the dataset, it can be said that the proposed model improves the classification rate and outperform the base learner.

[1]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[2]  Abdelkamel Tari,et al.  Dimensionality reduction in data mining: A Copula approach , 2016, Expert Syst. Appl..

[3]  Marlien Herselman,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2015 .

[4]  S. Kanmani,et al.  A hybrid algorithm using ant and bee colony optimization for feature selection and classification (AC-ABC Hybrid) , 2017, Swarm Evol. Comput..

[5]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[6]  Caroline Chan,et al.  Determination of quantization intervals in rule based model for dynamic systems , 1991, Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics.

[7]  HorngJorng-Tzong,et al.  An expert system to classify microarray gene expression data using gene selection by decision tree , 2009 .

[8]  Srinivasan Parthasarathy,et al.  Toward unsupervised correlation preserving discretization , 2005, IEEE Transactions on Knowledge and Data Engineering.

[9]  Il-Seok Oh,et al.  Classifier ensemble selection using hybrid genetic algorithms , 2008, Pattern Recognit. Lett..

[10]  Francisco Herrera,et al.  A distributed evolutionary multivariate discretizer for Big Data processing on Apache Spark , 2018, Swarm Evol. Comput..

[11]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[12]  Jorng-Tzong Horng,et al.  An expert system to classify microarray gene expression data using gene selection by decision tree , 2009, Expert Syst. Appl..

[13]  Arputharaj Kannan,et al.  Distance Based Genetic Algorithm for Feature Selection in Computer Aided Diagnosis Systems , 2017 .

[14]  Shahrokh Asadi,et al.  EMDID: Evolutionary multi-objective discretization for imbalanced datasets , 2018, Inf. Sci..

[15]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[16]  Nambiraj Suguna,et al.  Optimized Feature Selection for Enhanced Epileptic Seizure Detection , 2014 .

[17]  Francisco Herrera,et al.  A First Study on the Use of Coevolutionary Algorithms for Instance and Feature Selection , 2009, HAIS.

[18]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[19]  Yafei Zhang,et al.  Dynamic Adaboost learning with feature selection based on parallel genetic algorithm for image annotation , 2010, Knowl. Based Syst..

[20]  Jessica Andrea Carballido,et al.  Discretization of gene expression data revised , 2016, Briefings Bioinform..

[21]  Huaiqing Wang,et al.  A discretization algorithm based on a heterogeneity criterion , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[23]  Chee Keong Kwoh,et al.  Drug-target interaction prediction via class imbalance-aware ensemble learning , 2016, BMC Bioinformatics.

[24]  Mohammed Imamul Hassan Bhuiyan,et al.  Automatic sleep scoring using statistical features in the EMD domain and ensemble methods , 2016 .

[25]  Włodzisław Duch,et al.  Feature Ranking , Selection and Discretization , 2003 .

[26]  Ramón López de Mántaras,et al.  Proposal and Empirical Comparison of a Parallelizable Distance-Based Discretization Method , 1997, KDD.

[27]  Francis Eng Hock Tay,et al.  A Modified Chi2 Algorithm for Discretization , 2002, IEEE Trans. Knowl. Data Eng..

[28]  Wei Zhang,et al.  A filter feature selection method based LLRFC and redundancy analysis for tumor classification using gene expression data , 2016, 2016 12th World Congress on Intelligent Control and Automation (WCICA).

[29]  Francisco Herrera,et al.  Multivariate Discretization Based on Evolutionary Cut Points Selection for Classification , 2016, IEEE Transactions on Cybernetics.

[30]  Qiang Wang,et al.  2D Depiction of Biological Interactions and Its Applications in Drug Design , 2013 .

[31]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[32]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[33]  U. Maulik,et al.  An SVM-Wrapped Multiobjective Evolutionary Feature Selection Approach for Identifying Cancer-MicroRNA Markers , 2013, IEEE Transactions on NanoBioscience.

[34]  Habibollah Haron,et al.  Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  Pedro Sousa,et al.  Email Spam Detection: a Symbiotic Feature Selection Approach Fostered by Evolutionary Computation , 2013, Int. J. Inf. Technol. Decis. Mak..

[36]  Weidong Xiao,et al.  A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features , 2017, J. Bioinform. Comput. Biol..

[37]  M. Tahar Kechadi,et al.  Multi-objective feature selection by using NSGA-II for customer churn prediction in telecommunications , 2010, Expert Syst. Appl..

[38]  Fernando J. Von Zuben,et al.  Necessary and Sufficient Conditions for Surrogate Functions of Pareto Frontiers and Their Synthesis Using Gaussian Processes , 2017, IEEE Transactions on Evolutionary Computation.

[39]  John H. Holland,et al.  Genetic Algorithms and the Optimal Allocation of Trials , 1973, SIAM J. Comput..

[40]  B. Chandra,et al.  An efficient statistical feature selection approach for classification of gene expression data , 2011, J. Biomed. Informatics.

[41]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[42]  Geoffrey I. Webb,et al.  PROSPER: An Integrated Feature-Based Tool for Predicting Protease Substrate Cleavage Sites , 2012, PloS one.

[43]  Rajashree Dash,et al.  Comparative Analysis of Supervised and Unsupervised Discretization Techniques , 2011 .

[44]  Luis González Abril,et al.  Ameva: An autonomous discretization algorithm , 2009, Expert Syst. Appl..

[45]  Geoffrey I. Webb,et al.  Discretization for naive-Bayes learning: managing discretization bias and variance , 2008, Machine Learning.

[46]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[47]  Xindong Wu,et al.  Gene expression analyses using Genetic Algorithm based hybrid approaches , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[48]  Agma J. M. Traina,et al.  Improving the ranking quality of medical image retrieval using a genetic feature selection method , 2011, Decis. Support Syst..

[49]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[50]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[51]  Marc Boullé,et al.  MODL: A Bayes optimal discretization method for continuous attributes , 2006, Machine Learning.

[52]  Gil Alterovitz,et al.  Wrapper-based gene selection with Markov blanket , 2017, Comput. Biol. Medicine.

[53]  Francisco Herrera,et al.  An Evolutionary Multiobjective Model and Instance Selection for Support Vector Machines With Pareto-Based Ensembles , 2017, IEEE Transactions on Evolutionary Computation.

[54]  Francisco Herrera,et al.  A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.