A novel gene selection method using modified MRMR and hybrid bat-inspired algorithm with β-hill climbing

This paper proposed a new gene selection method based on modified Minimum Redundancy Maximum Relevancy (MRMR) as a filtering approach and hybrid bat algorithm with β-hill climbing as an efficient wrapper approach. The gene selection is a process of selecting the discriminative genes that aid in the development of efficient cancer diagnosis and classification. In general, the current filter-based approaches produced gene subset according to its discriminative power. However, one of the deficiencies of single filter approaches is that it has high variability of the classification results. Accordingly, this study aim to improve MRMR through incorporating its with ensemble of filters to increase the robustness and the stability of MRMR. The result of filtering-based approach is a set of discriminative genes. The wrapper-based approach considers the results from the filtering-based approach to formulate the gene selection search space. In wrapper approach, bat algorithm is tailored for gene selection problem and hybridized with a powerful local search method called beta hill climbing to further stress the deep learning side in the search space navigation and thus find a very robust and stable discriminative genes. Bat-inspired algorithm (BA) is a recent swarm-based optimization method while β-hill climbing is an exploratory local search. The proposed method is called Robust MRMR and Hybrid Bat-inspired Algorithm (rMRMR-HBA). To evaluate the proposed method, ten well-known microarray datasets are experimented with. These datasets are varies in terms of number of genes, samples, and classes. For performance evaluation, the proposed filtering-based approach (i.e., rMRMR) is initially tested against the standard MRMR and other well-regard filtering approaches. Thereafter, the wrapper-based approach (i.e., HBA) is evaluated by studying the convergence behavior of BA with and without β-hill climbing. For comparative evaluation, the results of the proposed rMRMR-HBA were compared with state-of-art methods using the same microarray datasets. The comparative results show that our proposed approach achieved outstanding results in two out of ten datasets in terms of clarification accuracy and minimum number of genes.

[1]  Kuo-Chen Chou,et al.  Prediction of Protein Domain with mRMR Feature Selection and Analysis , 2012, PloS one.

[2]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[3]  El-Ghazali Talbi,et al.  Comparison of population based metaheuristics for feature selection: Application to microarray data classification , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[4]  Mohammad Kazem Ebrahimpour,et al.  Ensemble of feature selection methods: A hesitant fuzzy sets approach , 2017, Appl. Soft Comput..

[5]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Zexuan Zhu,et al.  Markov blanket-embedded genetic algorithm for gene selection , 2007, Pattern Recognit..

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Pupong Pongcharoen,et al.  Solving Multi-Stage Multi-Machine Multi-Product Scheduling Problem Using Bat Algorithm , 2012 .

[9]  Gamal Attiya,et al.  Classification of human cancer diseases by gene expression profiles , 2017, Appl. Soft Comput..

[10]  N. Ramaraj,et al.  A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm , 2010, Knowl. Based Syst..

[11]  Nikola Bogunovic,et al.  A review of feature selection methods with applications , 2015, 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[12]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[13]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[14]  Lawrence Davis,et al.  Bit-Climbing, Representational Bias, and Test Suite Design , 1991, ICGA.

[15]  Chandra Jagan Mohan,et al.  APPLICATION OF BAT ALGORITHM FOR COMBIMNED ECONOMIC LOAD AND EMISSION DISPATCH , 2020 .

[16]  Jin-Kao Hao,et al.  A memetic algorithm for gene selection and molecular classification of cancer , 2009, GECCO '09.

[17]  Wei-Chang Yeh,et al.  Gene selection using information gain and improved simplified swarm optimization , 2016, Neurocomputing.

[18]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[19]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[20]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[21]  Debahuti Mishra,et al.  A New Meta-heuristic Bat Inspired Classification Approach for Microarray Data , 2012 .

[22]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[23]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[24]  Jiann-Horng Lin,et al.  A Chaotic Levy Flight Bat Algorithm for Parameter Estimation in Nonlinear Dynamic Biological Systems , 2012, CIT 2012.

[25]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[26]  Mohammed Azmi Al-Betar,et al.  Hybridizing β-hill climbing with wavelet transform for denoising ECG signals , 2018, Inf. Sci..

[27]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[28]  Verónica Bolón-Canedo,et al.  An ensemble of filters and classifiers for microarray data classification , 2012, Pattern Recognit..

[29]  S. Akhtar,et al.  A Metaheuristic Bat-Inspired Algorithm for Full Body Human Pose Estimation , 2012, 2012 Ninth Conference on Computer and Robot Vision.

[30]  Verónica Bolón-Canedo,et al.  Ensemble feature selection: Homogeneous and heterogeneous approaches , 2017, Knowl. Based Syst..

[31]  Gaige Wang,et al.  A Bat Algorithm with Mutation for UCAV Path Planning , 2012, TheScientificWorldJournal.

[32]  Hossam Faris,et al.  Bat-inspired algorithms with natural selection mechanisms for global optimization , 2018, Neurocomputing.

[33]  Shutao Li,et al.  Gene selection using hybrid particle swarm optimization and genetic algorithm , 2008, Soft Comput..

[34]  Chao-Ton Su,et al.  An Extended Chi2 Algorithm for Discretization of Real Value Attributes , 2005, IEEE Trans. Knowl. Data Eng..

[35]  Driss Aboutajdine,et al.  A two-stage gene selection scheme utilizing MRMR filter and GA wrapper , 2011, Knowledge and Information Systems.

[36]  Salwani Abdullah,et al.  Hybridising harmony search with a Markov blanket for gene selection problems , 2014, Inf. Sci..

[37]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[38]  Xin-She Yang,et al.  Bat algorithm: a novel approach for global engineering optimization , 2012, 1211.6663.

[39]  Vinod Kumar Jain,et al.  Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification , 2018, Appl. Soft Comput..

[40]  Ali Najafi,et al.  A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata , 2017 .

[41]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[42]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[43]  Mariappan Kadarkarainadar Marichelvam,et al.  Hybrid bat algorithm for flow shop scheduling problems , 2016, Int. J. Math. Oper. Res..

[44]  Reinhold Haux,et al.  A Collaboration Tool Based on SNOCAP-HET , 2013, Journal of Medical Systems.

[45]  Bruce E. Rosen,et al.  Genetic Algorithms and Very Fast Simulated Reannealing: A comparison , 1992 .

[46]  Mohammed Azmi Al-Betar,et al.  Gene selection for cancer classification by combining minimum redundancy maximum relevancy and bat-inspired algorithm , 2017, Int. J. Data Min. Bioinform..

[47]  Jose Crispin Hernandez Hernandez,et al.  Hybrid Filter-Wrapper with a Specialized Random Multi-Parent Crossover Operator for Gene Selection and Classification Problems , 2011, ICIC.

[48]  Xin-She Yang,et al.  BBA: A Binary Bat Algorithm for Feature Selection , 2012, 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images.

[49]  Olympia Roeva,et al.  Hybrid Bat Algorithm for Parameter Identification of an E. Coli Cultivation Process Model , 2013 .

[50]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Xin-She Yang,et al.  A New Metaheuristic Bat-Inspired Algorithm , 2010, NICSO.

[52]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[53]  G. Dueck New optimization heuristics , 1993 .

[54]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[55]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[56]  Jesús S. Aguilar-Ruiz,et al.  Incremental wrapper-based gene selection from microarray data for cancer classification , 2006, Pattern Recognit..

[57]  Musa Peker,et al.  A Comparative Study on Classification of Sleep Stage Based on EEG Signals Using Feature Selection and Classification Algorithms , 2014, Journal of Medical Systems.

[58]  Abdolreza Mirzaei,et al.  A novel Bat Algorithm based on chaos for optimization tasks , 2014, 2014 Iranian Conference on Intelligent Systems (ICIS).

[59]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[60]  Ghada Hany Badr,et al.  Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification , 2015, Comput. Biol. Chem..

[61]  Minghao Yin,et al.  Multiobjective Binary Biogeography Based Optimization for Feature Selection Using Gene Expression Data , 2013, IEEE Transactions on NanoBioscience.

[62]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[63]  LiShutao,et al.  Gene selection using hybrid particle swarm optimization and genetic algorithm , 2008, SOCO 2008.

[64]  Li-Yeh Chuang,et al.  A Hybrid BPSO-CGA Approach for Gene Selection and Classification of Microarray Data , 2012, J. Comput. Biol..

[65]  Mohammed Azmi Al-Betar,et al.  β\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}-Hill climbing: an exploratory local search , 2016, Neural Computing and Applications.