A novel hybrid algorithm for feature selection

Feature selection is an important filtering method for data analysis, pattern classification, data mining, and so on. Feature selection reduces the number of features by removing irrelevant and redundant data. In this paper, we propose a hybrid filter–wrapper feature subset selection algorithm called the maximum Spearman minimum covariance cuckoo search (MSMCCS). First, based on Spearman and covariance, a filter algorithm is proposed called maximum Spearman minimum covariance (MSMC). Second, three parameters are proposed in MSMC to adjust the weights of the correlation and redundancy, improve the relevance of feature subsets, and reduce the redundancy. Third, in the improved cuckoo search algorithm, a weighted combination strategy is used to select candidate feature subsets, a crossover mutation concept is used to adjust the candidate feature subsets, and finally, the filtered features are selected into optimal feature subsets. Therefore, the MSMCCS combines the efficiency of filters with the greater accuracy of wrappers. Experimental results on eight common data sets from the University of California at Irvine Machine Learning Repository showed that the MSMCCS algorithm had better classification accuracy than the seven wrapper methods, the one filter method, and the two hybrid methods. Furthermore, the proposed algorithm achieved preferable performance on the Wilcoxon signed-rank test and the sensitivity–specificity test.

[1]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yiu-ming Cheung,et al.  Feature Selection and Kernel Learning for Local Learning-Based Clustering , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  R. K. Agrawal,et al.  An incremental feature selection approach based on scatter matrices for classification of cancer microarray data , 2015, Int. J. Comput. Math..

[4]  Feiping Nie,et al.  Feature Selection via Global Redundancy Minimization , 2015, IEEE Transactions on Knowledge and Data Engineering.

[5]  Xin-She Yang,et al.  Cuckoo search: recent advances and applications , 2013, Neural Computing and Applications.

[6]  William Eberle,et al.  Genetic algorithms in feature and instance selection , 2013, Knowl. Based Syst..

[7]  Jianhua Dai,et al.  Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification , 2013, Appl. Soft Comput..

[8]  Yuan-Hai Shao,et al.  A GA-based model selection for smooth twin parametric-margin support vector machine , 2013, Pattern Recognit..

[9]  Xin-She Yang,et al.  Discrete cuckoo search algorithm for the travelling salesman problem , 2014, Neural Computing and Applications.

[10]  Antonio Cuevas,et al.  Variable selection in functional data classification: a maxima-hunting proposal , 2013, 1309.6697.

[11]  Parham Moradi,et al.  A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy , 2016, Appl. Soft Comput..

[12]  James Kennedy,et al.  Particle swarm optimization , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[13]  Zne-Jung Lee,et al.  Parameter determination of support vector machine and feature selection using simulated annealing approach , 2008, Appl. Soft Comput..

[14]  Jonathan M. Garibaldi,et al.  A 'non-parametric' version of the naive Bayes classifier , 2011, Knowl. Based Syst..

[15]  Jiasen Lu,et al.  Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. , 2000, Nucleic acids research.

[16]  João Miguel da Costa Sousa,et al.  Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients , 2013, Appl. Soft Comput..

[17]  Seyed Mohammad Mirjalili,et al.  The Ant Lion Optimizer , 2015, Adv. Eng. Softw..

[18]  Ratna Babu Chinnam,et al.  mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification , 2011, Inf. Sci..

[19]  Hongnian Yu,et al.  Maximum relevancy maximum complementary feature selection for multi-sensor activity recognition , 2015, Expert Syst. Appl..

[20]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[21]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[22]  Dun Liu,et al.  Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set , 2013, Knowl. Based Syst..

[23]  Jean Yee Hwa Yang,et al.  Gene-gene interaction filtering with ensemble of filters , 2011, BMC Bioinformatics.

[24]  Zulaiha Ali Othman,et al.  Metaheuristic approach for an enhanced mRMR filter method for classification using drug response microarray data , 2017, Expert Syst. Appl..

[25]  Kevin M. Passino,et al.  Biomimicry of bacterial foraging for distributed optimization and control , 2002 .

[26]  Alpaslan Duysak,et al.  Cross grouping strategy based 2DPCA method for face recognition , 2015, Appl. Soft Comput..

[27]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Xin-She Yang,et al.  Bat algorithm: literature review and applications , 2013, Int. J. Bio Inspired Comput..

[29]  Gang Wang,et al.  A novel bacterial foraging optimization algorithm for feature selection , 2017, Expert Syst. Appl..

[30]  Xin-She Yang,et al.  Cuckoo Search via Lévy flights , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[31]  Dana Kulic,et al.  An evaluation of classifier-specific filter measure performance for feature selection , 2015, Pattern Recognit..

[32]  Alexandr Katrutsa,et al.  Stress test procedure for feature selection algorithms , 2015 .

[33]  Driss Aboutajdine,et al.  A two-stage gene selection scheme utilizing MRMR filter and GA wrapper , 2011, Knowledge and Information Systems.

[34]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[35]  Haiping Lu,et al.  MPCA: Multilinear Principal Component Analysis of Tensor Objects , 2008, IEEE Transactions on Neural Networks.

[36]  Richard Nock,et al.  A hybrid filter/wrapper approach of feature selection using information theory , 2002, Pattern Recognit..

[37]  W. J. Conover,et al.  On Methods of Handling Ties in the Wilcoxon Signed-Rank Test , 1973 .

[38]  Isotta Chimenti,et al.  The Potential of GMP-Compliant Platelet Lysate to Induce a Permissive State for Cardiovascular Transdifferentiation in Human Mediastinal Adipose Tissue-Derived Mesenchymal Stem Cells , 2015, BioMed research international.

[39]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[40]  P. K. Dash,et al.  An improved cuckoo search based extreme learning machine for medical data classification , 2015, Swarm Evol. Comput..

[41]  Concha Bielza,et al.  Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data , 2013, Inf. Sci..

[42]  Hala Alshamlan,et al.  mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling , 2015, BioMed research international.

[43]  Majid Komeili,et al.  Local Feature Selection for Data Classification , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[45]  Xin-She Yang,et al.  A wrapper approach for feature selection based on Bat Algorithm and Optimum-Path Forest , 2014, Expert Syst. Appl..