A Novel Feature Selection Method on Mutual Information and Improved Gravitational Search Algorithm for High Dimensional Biomedical Data

In the past few decades, the field of bioinformatics has accumulated a large amount of gene expression data which provided important support for the diagnosis of disease. However, high dimensionality, small sample sizes, and redundant features often adversely affect the accuracy and the speed of prediction. Existing feature selection models cannot obtain the information of these datasets accurately. Filter and wrapper are two commonly used feature selection methods. Combining the advantages of the fast calculation speed of the filter and the high accuracy of the wrapper, a new hybrid algorithm called MIIBGSA, is proposed, which hybridizes mutual information and improved Gravitational Search Algorithm (GSA). First, mutual information is used to rank and select important features, these features are further chosen into the population of the wrapper method. Then, due to the effectiveness of the GSA algorithm, GSA is adopted to further seek an optimal feature subset. However, GSA also has the disadvantages of slow search speed and premature convergence, which limit its optimization ability. In our work, a scale function is added to the speed update to enhance its searchability, and an adaptive ${k}_{best}$ particle update formula is proposed to improve its convergence accuracy and propose a fitness sharing strategy to enhance the randomness of particle populations and searchability through the niche algorithm of fitness sharing. We used 10-fold-CV method with the KNN classifier to evaluate the classification accuracy. Experimental results on five publicly available high-dimensional biomedical data sets show that the proposed MI-IBGSA has superior performance than other algorithms.

[1]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[2]  Jie-sheng Wang,et al.  A Hybrid Algorithm Based on Gravitational Search and Particle Swarm Optimization Algorithm to Solve Function Optimization Problems , 2016 .

[3]  Vinod Kumar Jain,et al.  Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification , 2018, Appl. Soft Comput..

[4]  L. Vitagliano,et al.  Histone deacetylase and Cullin3–RENKCTD11 ubiquitin ligase interplay regulates Hedgehog signalling through Gli acetylation , 2010, Nature Cell Biology.

[5]  Weiguo Zhao Adaptive Image Enhancement Based on Gravitational Search Algorithm , 2011 .

[6]  Miljenko Huzak,et al.  Chi-Square Distribution , 2011, International Encyclopedia of Statistical Science.

[7]  Gamal Attiya,et al.  Classification of human cancer diseases by gene expression profiles , 2017, Appl. Soft Comput..

[8]  Hossam Faris,et al.  An efficient binary Salp Swarm Algorithm with crossover scheme for feature selection problems , 2018, Knowl. Based Syst..

[9]  L. Di Marcotullio,et al.  Hedgehog Antagonist RENKCTD11 Regulates Proliferation and Apoptosis of Developing Granule Cell Progenitors , 2005, The Journal of Neuroscience.

[10]  Sushama Nagpal,et al.  Feature Selection using Gravitational Search Algorithm for Biomedical Data , 2017 .

[11]  Xiaoyan Xiong,et al.  A novel hybrid system for feature selection based on an improved gravitational search algorithm and k-NN method , 2015, Appl. Soft Comput..

[12]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Hossein Nezamabadi-pour,et al.  BGSA: binary gravitational search algorithm , 2010, Natural Computing.

[14]  Hossein Nezamabadi-pour,et al.  Facing the classification of binary problems with a GSA-SVM hybrid system , 2013, Math. Comput. Model..

[15]  Jason Weston,et al.  Embedded Methods , 2006, Feature Extraction.

[16]  Ugur Güvenc,et al.  Combined economic and emission dispatch solution using gravitational search algorithm , 2012, Sci. Iran..

[17]  M. Lathrop,et al.  THEMIS Is Required for Pathogenesis of Cerebral Malaria and Protection against Pulmonary Tuberculosis , 2014, Infection and Immunity.

[18]  R. R. Rajalaxmi,et al.  Feature selection using Artificial Bee Colony for cardiovascular disease classification , 2014, 2014 International Conference on Electronics and Communication Systems (ICECS).

[19]  Zhihong Man,et al.  Classification of microarray datasets using finite impulse response extreme learning machine for cancer diagnosis , 2011, IECON 2011 - 37th Annual Conference of the IEEE Industrial Electronics Society.

[20]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[21]  Xiaoming Xu,et al.  A hybrid genetic algorithm for feature selection wrapper based on mutual information , 2007, Pattern Recognit. Lett..

[22]  Mengjie Zhang,et al.  Differential evolution for filter feature selection based on information theory and feature ranking , 2018, Knowl. Based Syst..

[23]  Raymond Chiong,et al.  Hybrid filter-wrapper feature selection for short-term load forecasting , 2015, Eng. Appl. Artif. Intell..

[24]  Luis Alfonso Ureña López,et al.  Using information gain to improve multi-modal information retrieval systems , 2008, Inf. Process. Manag..

[25]  Bo Xing,et al.  Gravitational Search Algorithm , 2014 .

[26]  B Subanya A Novel Feature Selection Algorithm for Heart Disease Classification , 2015 .

[27]  Hossein Nezamabadi-pour,et al.  GSA: A Gravitational Search Algorithm , 2009, Inf. Sci..

[28]  Dhruba K. Bhattacharyya,et al.  EFS-MI: an ensemble feature selection method for classification , 2017, Complex & Intelligent Systems.