MF-GARF: Hybridizing Multiple Filters and GA Wrapper for Feature Selection of Microarray Cancer Datasets

DNA Microarray technology is a valuable advancement in medical field but it gives birth to many challenges like curse of dimensionality, storage and computational requirements. In this paper we have proposed, a multiple filters and GA wrapper based hybrid approach (MF-GARF) that incorporates Random forest as fitness evaluator of features. The proposed hybrid approach MF-GARF is comprised of three phases relevancy block; containing information theory based filters Information Gain, Gain Ratio and Gini Index, responsible for ensuring relevancy and removal of irrelevant and noisy features. Second phase is Redundancy block; incorporating Pearson Correlation statistics to remove redundancy among features, and then final phase Optimization Block; containing Genetic Algorithm wrapper with Random Forest as fitness evaluator, responsible for generating an optimal feature subset with high predictive power. Random Forest with 10-fold cross validation is used to calculate the classification accuracy of selected feature subset. Experiments are carried out on 7 publically available benchmark Microarray cancer datasets and the proposed algorithm has achieved good accuracy with minimal selected features for all datasets. The comparison with other state of the art hybrid techniques validates the effectiveness of our proposed approach.

[1]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[2]  A. Hasan,et al.  High dimensional microarray data classification using correlation based feature selection , 2012, 2012 International Conference on Biomedical Engineering (ICoBE).

[3]  Nizamettin Aydin,et al.  Gene selection and classification approach for microarray data based on Random Forest Ranking and BBHA , 2016, 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).

[4]  R. Enayatifar,et al.  Heuristic filter feature selection methods for medical datasets. , 2020, Genomics.

[5]  Huamin Yang,et al.  A hybrid feature selection method based on genetic algorithm and information gain , 2016, 2016 5th International Conference on Computer Science and Network Technology (ICCSNT).

[6]  Marco D. Santambrogio,et al.  Pearson Correlation Coefficient Acceleration for Modeling and Mapping of Neural Interconnections , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[7]  Anirban Mukhopadhyay,et al.  An Improved Minimum Redundancy Maximum Relevance Approach for Feature Selection in Gene Expression Data , 2013 .

[8]  Minghao Yin,et al.  Multiobjective Binary Biogeography Based Optimization for Feature Selection Using Gene Expression Data , 2013, IEEE Transactions on NanoBioscience.

[9]  Parham Moradi,et al.  A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy , 2016, Appl. Soft Comput..

[10]  Pugalendhi GaneshKumar,et al.  Fuzzy Expert System based on a Novel Hybrid Stem Cell (HSC) Algorithm for Classification of Micro Array Data , 2018, Journal of Medical Systems.

[11]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[12]  Hala Alshamlan,et al.  mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling , 2015, BioMed research international.

[13]  Namita Srivastava,et al.  A novel approach for dimension reduction of microarray , 2017, Comput. Biol. Chem..

[14]  Hossein Rabbani,et al.  A novel feature selection method for microarray data classification based on hidden Markov model , 2019, J. Biomed. Informatics.

[15]  Gamal Attiya,et al.  Classification of human cancer diseases by gene expression profiles , 2017, Appl. Soft Comput..

[16]  Nada Almugren,et al.  FF-SVM: New FireFly-based Gene Selection Algorithm for Microarray Cancer Classification , 2019, 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[17]  Souad Guessoum,et al.  Fast correlation based filter combined with genetic algorithm and particle swarm on feature selection , 2017, 2017 5th International Conference on Electrical Engineering - Boumerdes (ICEE-B).

[18]  M. Balafar,et al.  Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. , 2017, Genomics.

[19]  Rasmita Dash,et al.  A two stage grading approach for feature selection and classification of microarray data using Pareto based feature ranking techniques: A case study , 2017, J. King Saud Univ. Comput. Inf. Sci..

[20]  Narasimha Prasad,et al.  Gain Ratio as Attribute Selection Measure in Elegant Decision Tree to Predict Precipitation , 2013, 2013 8th EUROSIM Congress on Modelling and Simulation.

[21]  Li-Yeh Chuang,et al.  A Hybrid BPSO-CGA Approach for Gene Selection and Classification of Microarray Data , 2012, J. Comput. Biol..

[22]  D A Rew,et al.  DNA microarray technology in cancer research. , 2001, European journal of surgical oncology : the journal of the European Society of Surgical Oncology and the British Association of Surgical Oncology.

[23]  Zijiang Yang,et al.  Partial maximum correlation information: A new feature selection method for microarray data classification , 2019, Neurocomputing.

[24]  Huan Liu,et al.  Challenges of Feature Selection for Big Data Analytics , 2016, IEEE Intelligent Systems.

[25]  Salwani Abdullah,et al.  Hybrid feature selection algorithm using symmetrical uncertainty and a harmony search algorithm , 2016, Int. J. Syst. Sci..

[26]  Prashanth Suravajhala,et al.  Gene selection for tumor classification using a novel bio-inspired multi-objective approach. , 2018, Genomics.

[27]  Aleena Ahmad,et al.  Hybrid of Filters and Genetic Algorithm - Random Forests Based Wrapper Approach for Feature Selection and Prediction , 2019 .

[28]  Mohammad Hossein Moattar,et al.  A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. , 2016, Genomics.

[29]  Vijendra Singh,et al.  A Feature Selection Algorithm Based on Qualitative Mutual Information for Cancer Microarray Data , 2018 .

[30]  Yousef Al-Ohali,et al.  ABC-SVM: Artificial Bee Colony and SVM Method for Microarray Gene Selection and Multi Class Cancer Classification , 2016 .

[31]  Habibollah Haron,et al.  Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Vinod Kumar Jain,et al.  Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification , 2018, Appl. Soft Comput..

[33]  S. Sivagama Sundhari A knowledge discovery using decision tree by Gini coefficient , 2011, 2011 International Conference on Business, Engineering and Industrial Applications.