Tuning parameter estimation in SCAD-support vector machine using firefly algorithm with application in gene selection and cancer classification

In cancer classification, gene selection is one of the most important bioinformatics related topics. The selection of genes can be considered to be a variable selection problem, which aims to find a small subset of genes that has the most discriminative information for the classification target. The penalized support vector machine (PSVM) has proved its effectiveness at creating a strong classifier that combines the advantages of the support vector machine and penalization. PSVM with a smoothly clipped absolute deviation (SCAD) penalty is the most widely used method. However, the efficiency of PSVM with SCAD depends on choosing the appropriate tuning parameter involved in the SCAD penalty. In this paper, a firefly algorithm, which is a metaheuristic continuous algorithm, is proposed to determine the tuning parameter in PSVM with SCAD penalty. Our proposed algorithm can efficiently help to find the most relevant genes with high classification performance. The experimental results from four benchmark gene expression datasets show the superior performance of the proposed algorithm in terms of classification accuracy and the number of selected genes compared with competing methods.

[1]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[2]  Xin-She Yang,et al.  Multiobjective firefly algorithm for continuous optimization , 2012, Engineering with Computers.

[3]  Jelle J Goeman,et al.  Efficient approximate k‐fold and leave‐one‐out cross‐validation for ridge regression , 2013, Biometrical journal. Biometrische Zeitschrift.

[4]  S. Roberts,et al.  Stabilizing the lasso against cross-validation variability , 2014, Comput. Stat. Data Anal..

[5]  Li-Yeh Chuang,et al.  Gene selection and classification using Taguchi chaotic binary particle swarm optimization , 2011, Expert Syst. Appl..

[6]  Rohayanti Hassan,et al.  Selection and classification of gene expression in autism disorder: Use of a combination of statistical filters and a GBPSO-SVM algorithm , 2017, PloS one.

[7]  Wei Kong,et al.  A combination of modified particle swarm optimization algorithm and support vector machine for gene selection and tumor classification. , 2007, Talanta.

[8]  Nawaf N. Hamadneh,et al.  Continuous versions of firefly algorithm: a review , 2017, Artificial Intelligence Review.

[9]  Jianhua Wang,et al.  Optimal feature selection using distance-based discrete firefly algorithm with mutual information criterion , 2017, Neural Computing and Applications.

[10]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification and gene selection , 2008, Bioinform..

[11]  Janez Brest,et al.  A comprehensive review of firefly algorithms , 2013, Swarm Evol. Comput..

[12]  Kazushi Ikeda,et al.  Geometrical Properties of Nu Support Vector Machines with Different Norms , 2005, Neural Computation.

[13]  Cemal Köse,et al.  A modified firefly algorithm for global minimum optimization , 2018, Appl. Soft Comput..

[14]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[15]  S. Karthikeyan,et al.  A hybrid discrete firefly algorithm for multi-objective flexible job shop scheduling problem with limited resource constraints , 2014, The International Journal of Advanced Manufacturing Technology.

[16]  Zakariya Yahya Algamal,et al.  Feature selection using particle swarm optimization-based logistic regression model , 2018 .

[17]  Ali Najafi,et al.  A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata , 2017 .

[18]  Bingqing Lin,et al.  Regularisation Parameter Selection Via Bootstrapping , 2016 .

[19]  William Valdar,et al.  A permutation approach for selecting the penalty parameter in penalized model selection , 2014, Biometrics.

[20]  Karl W. Broman,et al.  A model selection approach for the identification of quantitative trait loci in experimental crosses , 2002 .

[21]  Yoonsuh Jung,et al.  A K-fold averaging cross-validation procedure , 2015, Journal of nonparametric statistics.

[22]  Axel Benner,et al.  Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data , 2011, BMC Bioinformatics.

[23]  Li Zhang,et al.  Classifier ensemble reduction using a modified firefly algorithm: An empirical evaluation , 2018, Expert Syst. Appl..

[24]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[25]  Muhammad Hisyam Lee,et al.  Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification , 2015, Comput. Biol. Medicine.

[26]  Stanislaw Osowski,et al.  Data mining for feature selection in gene expression autism data , 2015, Expert Syst. Appl..

[27]  Muhammad Hisyam Lee,et al.  A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification , 2018, Advances in Data Analysis and Classification.

[28]  Sadanori Konishi,et al.  Robust sparse regression and tuning parameter selection via the efficient bootstrap information criteria , 2014 .

[29]  Ying Xue,et al.  Quantitative structure–activity relationship study of influenza virus neuraminidase A/PR/8/34 (H1N1) inhibitors by genetic algorithm feature selection and support vector regression , 2013 .

[30]  Kwong-Sak Leung,et al.  Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification , 2013, BMC Bioinformatics.

[31]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[32]  Muhammad Hisyam Lee,et al.  Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification , 2015, Expert Syst. Appl..

[33]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[34]  Gao Jian,et al.  Parameter Selection of a Support Vector Machine, Based on a Chaotic Particle Swarm Optimization Algorithm , 2015 .

[35]  Shuhao Yu,et al.  Enhancing firefly algorithm using generalized opposition-based learning , 2015, Computing.

[36]  Yufeng Liu,et al.  Support vector machines with adaptive Lq penalty , 2007, Comput. Stat. Data Anal..

[37]  Shili Lin,et al.  Sparse Support Vector Machines with L_{p} Penalty for Biomarker Identification , 2010, TCBB.