Variable Selection in Count Data Regression Model based on Firefly Algorithm

Variable selection is a very helpful procedure for improving computational speed and prediction accuracy by identifying the most important variables that related to the response variable. Count data regression modeling has received much attention in several science fields in which the Poisson and negative binomial regression models are the most basic models. Firefly algorithm is one of the recently efficient proposed nature-inspired algorithms that can efficiently be employed for variable selection. In this work, firefly algorithm is proposed to perform variable selection for count data regression models. Extensive simulation studies and two real data applications are conducted to evaluate the performance of the proposed method in terms of prediction accuracy and variable selection criteria. Further, its performance is compared with other methods. The results proved the efficiency of our proposed methods and it outperforms other popular methods.

[1]  H. Hasan Örkcü Subset selection in multiple linear regression models: A hybrid of genetic and simulated annealing algorithms , 2013, Appl. Math. Comput..

[2]  Aboul Ella Hassanien,et al.  Feature selection via a novel chaotic crow search algorithm , 2017, Neural Computing and Applications.

[3]  Zakariya Yahya Algamal,et al.  Adjusted Adaptive LASSO in High-Dimensional Poisson Regression Model , 2015 .

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  H. Bozdogan,et al.  Variable subset selection via GA and information complexity in mixtures of Poisson and negative binomial regression models , 2015, 1505.05229.

[6]  J. Hilbe Negative Binomial Regression: Preface , 2007 .

[7]  Douglas B. Kell,et al.  Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry , 1997 .

[8]  Xin-She Yang,et al.  Multiobjective firefly algorithm for continuous optimization , 2012, Engineering with Computers.

[9]  Joaquín A. Pacheco,et al.  A variable selection method based on Tabu search for logistic regression models , 2009, Eur. J. Oper. Res..

[10]  Z. Algamal Developing a ridge estimator for the gamma regression model , 2018, Journal of Chemometrics.

[11]  Mohammad Arashi,et al.  Variable selection and structure identification for ultrahigh-dimensional partially linear additive models with application to cardiomyopathy microarray data , 2018 .

[12]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[13]  Alper Ekrem Murat,et al.  A discrete particle swarm optimization method for feature selection in binary classification problems , 2010, Eur. J. Oper. Res..

[14]  Michael J. Brusco,et al.  A comparison of simulated annealing algorithms for variable selection in principal component analysis and discriminant analysis , 2014, Comput. Stat. Data Anal..

[15]  Emre Dünder,et al.  Particle swarm optimization-based variable selection in Poisson regression analysis via information complexity-type criteria , 2018 .

[16]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[17]  Zvi Drezner,et al.  Tabu search model selection in multiple regression analysis , 1999 .

[18]  Z. Algamal,et al.  Proposed methods in estimating the ridge regression parameter in Poisson regression model , 2018 .

[19]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[20]  Muhammad Hisyam Lee,et al.  Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification , 2015, Expert Syst. Appl..

[21]  Z. Algamal Diagnostic in Poisson Regression Models , 2012 .

[22]  E. Avci,et al.  Flexiblity of Using Com-Poisson Regression Model for Count Data , 2018, Statistics, Optimization & Information Computing.

[23]  S. West,et al.  The Analysis of Count Data: A Gentle Introduction to Poisson Regression and Its Alternatives , 2009, Journal of personality assessment.

[24]  M. Cengiz,et al.  Variable selection in gamma regression models via artificial bee colony algorithm , 2018 .

[25]  M. Hariharan,et al.  Sine–cosine algorithm for feature selection with elitism strategy and new updating mechanism , 2017, Neural Computing and Applications.

[26]  Mehmet Ali Cengiz,et al.  Variable selection in linear regression analysis with alternative Bayesian information criteria using differential evaluation algorithm , 2018, Commun. Stat. Simul. Comput..

[27]  Shuhao Yu,et al.  Enhancing firefly algorithm using generalized opposition-based learning , 2015, Computing.

[28]  J. S. Long,et al.  The Origins of Sex Differences in Science , 1990 .

[30]  Bo Gao,et al.  Identification of DNA-binding proteins using multi-features fusion and binary firefly optimization algorithm , 2016, BMC Bioinformatics.

[31]  Joseph M. Hilbe,et al.  Modeling Count Data , 2014, International Encyclopedia of Statistical Science.

[32]  Muhammad Hisyam Lee,et al.  A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification , 2018, Advances in Data Analysis and Classification.

[33]  Shuangge Ma,et al.  Penalized count data regression with application to hospital stay after pediatric cardiac surgery , 2016, Statistical methods in medical research.

[34]  A. Genç,et al.  A New Two-Parameter Estimator for the Poisson Regression Model , 2018 .

[35]  Zakariya Yahya Algamal,et al.  Feature selection using particle swarm optimization-based logistic regression model , 2018 .