A novel memetic algorithm for discovering knowledge in binary and multi class predictions based on support vector machine

Display Omitted Feature selection is important factor that hurtle classification accuracy. Also stands a back bone for the Dimensionality Reduction to boost the classification accuracy.To assure this requirement, a suitable feature selector is desired to be enhanced.This paper presents a novel memetic based feature selection model named Shapley Value Embedded Genetic Algorithm (SVEGA) Feature Selector to solve these multi objective feature selection responsibilities.The fitness value of an each feature subset is measured by combining the genetic algorithm with the shapely value measures to predict the prominent features and these measures are evaluated using Support Vector Machine (SVM) with the specific choice of kernel specification based on both binary and Multi class problems.The fitness function optimises the specificity and sensitivity of the model and achieves higher prediction accuracy with fewer number of features. In classification, every feature of the data set is an important contributor towards prediction accuracy and affects the model building cost. To extract the priority features for prediction, a suitable feature selector is schemed. This paper proposes a novel memetic based feature selection model named Shapely Value Embedded Genetic Algorithm (SVEGA). The relevance of each feature towards prediction is measured by assembling genetic algorithms with shapely value measures retrieved from SVEGA. The obtained results are then evaluated using Support Vector Machine (SVM) with different kernel configurations on 11+11 benchmark datasets (both binary class and multi class). Eventually, a contrasting analysis is done between SVEGA-SVM and other existing feature selection models. The experimental results with the proposed setup provides robust outcome; hence proving it to be an efficient approach for discovering knowledge via feature selection with improved classification accuracy compared to conventional methods.

[1]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[2]  Chandrasekhar Kambhampati,et al.  Issues in the Mining of Heart Failure Datasets , 2014, Int. J. Autom. Comput..

[3]  Yufeng Liu,et al.  Multicategory ψ-Learning , 2006 .

[4]  Motoaki Kawanabe,et al.  Dimensionality reduction for density ratio estimation in high-dimensional spaces , 2010, Neural Networks.

[5]  Sebastian Zaunseder,et al.  Optimization of ECG Classification by Means of Feature Selection , 2011, IEEE Transactions on Biomedical Engineering.

[6]  Harun Uguz,et al.  A hybrid system based on information gain and principal component analysis for the classification of transcranial Doppler signals , 2012, Comput. Methods Programs Biomed..

[7]  Juanying Xie,et al.  Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases , 2011, Expert Syst. Appl..

[8]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[9]  Eytan Ruppin,et al.  Feature Selection Based on the Shapley Value , 2005, IJCAI.

[10]  Bu Hualong,et al.  Hybrid Feature Selection Mechanism based High Dimensional Datesets Reduction , 2011 .

[11]  Mu Zhu,et al.  LAGO: A Computationally Efficient Approach for Statistical Detection , 2006, Technometrics.

[12]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[13]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[14]  Huan Liu,et al.  Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[15]  Mazen Alamir,et al.  Combination of Model-based Observer and Support Vector Machines for Fault Detection of Wind Turbines , 2014, Int. J. Autom. Comput..

[16]  Verónica Bolón-Canedo,et al.  Distributed feature selection: An application to microarray data classification , 2015, Appl. Soft Comput..

[17]  Hao Wang,et al.  Online Streaming Feature Selection , 2010, ICML.

[18]  Mahantapas Kundu,et al.  A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application , 2012, Appl. Soft Comput..

[19]  Jose Miguel Puerta,et al.  Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking , 2012, Knowl. Based Syst..

[20]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[21]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[23]  Isaac Meilijson,et al.  Fair Attribution of Functional Contribution in Artificial and Biological Networks , 2004, Neural Computation.

[24]  John Q. Gan,et al.  A filter-dominating hybrid sequential forward floating search method for feature subset selection in high-dimensional space , 2014, Int. J. Mach. Learn. Cybern..

[25]  N. Ramaraj,et al.  A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm , 2010, Knowl. Based Syst..

[26]  Surajit Ray,et al.  Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction , 2011, BMC Bioinformatics.

[27]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[28]  Rainer Lenz,et al.  Isomorphism of Clones , 2005, J. Multiple Valued Log. Soft Comput..

[29]  Ghulam Muhammad,et al.  Feature Selection Based Verification/Identification System Using Fingerprints and Palm Print , 2013 .

[30]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[31]  ZhengYu-Jun,et al.  Evolutionary optimization for disaster relief operations , 2015 .

[32]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[33]  Concha Bielza,et al.  Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data , 2013, Inf. Sci..

[34]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[35]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[36]  Anderson Rocha,et al.  Multiclass From Binary: Expanding One-Versus-All, One-Versus-One and ECOC-Based Approaches , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Brian J. d'Auriol,et al.  A novel feature selection method based on normalized mutual information , 2011, Applied Intelligence.

[38]  Chih-Jen Lin,et al.  A Comparison of Methods for Multi-class Support Vector Machines , 2015 .

[39]  Dmitrij Frishman,et al.  Pitfalls of supervised feature selection , 2009, Bioinform..

[40]  Hui Li,et al.  Statistics-based wrapper for feature selection: An implementation on financial distress identification with support vector machine , 2014, Appl. Soft Comput..

[41]  Hossein Nezamabadi-pour,et al.  Facing the classification of binary problems with a GSA-SVM hybrid system , 2013, Math. Comput. Model..

[42]  Fioravante Patrone,et al.  Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution , 2008, BMC Bioinformatics.

[43]  Khalid Aa Abakar,et al.  Performance of SVM based on PUK kernel in comparison to SVM based on RBF kernel in prediction of yarn tenacity , 2014 .

[44]  A. R. Baig,et al.  Multi-Objective Feature Subset Selection using Non-dominated Sorting Genetic Algorithm , 2015 .

[45]  Yanqing Zhang,et al.  A genetic algorithm-based method for feature subset selection , 2008, Soft Comput..

[46]  Hao Helen Zhang,et al.  Multiclass Proximal Support Vector Machines , 2006 .

[47]  Sung-Nien Yu,et al.  Selection of effective features for ECG beat recognition based on nonlinear correlations , 2012, Artif. Intell. Medicine.

[48]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[49]  Yu-Jun Zheng,et al.  Evolutionary optimization for disaster relief operations: A survey , 2015, Appl. Soft Comput..

[50]  Hela Daassi-Gnaba,et al.  External vs. Internal SVM-RFE: The SVM-RFE Method Revisited and Applied to Emotion Recognition , 2015 .

[51]  Zenglin Xu,et al.  Discriminative Semi-Supervised Feature Selection Via Manifold Regularization , 2009, IEEE Transactions on Neural Networks.

[52]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[53]  Mário A. T. Figueiredo,et al.  An unsupervised approach to feature discretization and selection , 2012, Pattern Recognit..

[54]  Lior Rokach,et al.  The CASH algorithm-cost-sensitive attribute selection using histograms , 2013, Inf. Sci..