Two-Stage Hybrid Gene Selection Using Mutual Information and Genetic Algorithm for Cancer Data Classification

Cancer is a deadly disease which requires a very complex and costly treatment. Microarray data classification plays an important role in cancer treatment. An efficient gene selection technique to select the more promising genes is necessary for cancer classification. Here, we propose a Two-stage MI-GA Gene Selection algorithm for selecting informative genes in cancer data classification. In the first stage, Mutual Information based gene selection is applied which selects only the genes that have high information related to the cancer. The genes which have high mutual information value are given as input to the second stage. The Genetic Algorithm based gene selection is applied in the second stage to identify and select the optimal set of genes required for accurate classification. For classification, Support Vector Machine (SVM) is used. The proposed MI-GA gene selection approach is applied to Colon, Lung and Ovarian cancer datasets and the results show that the proposed gene selection approach results in higher classification accuracy compared to the existing methods.

[1]  Sankar K. Pal,et al.  A Granular Self-Organizing Map for Clustering and Gene Selection in Microarray Data , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Muhammad Hisyam Lee,et al.  Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification , 2015, Comput. Biol. Medicine.

[3]  De-Shuang Huang,et al.  A Gene Selection Method for Microarray Data Based on Binary PSO Encoding Gene-to-Class Sensitivity Information , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  M. Balafar,et al.  Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. , 2017, Genomics.

[5]  Nambiraj Suguna,et al.  An Independent Rough Set Approach Hybrid with Artificial Bee Colony Algorithm for Dimensionality Reduction , 2011 .

[6]  Ayman M. Eldeib,et al.  Breast cancer classification using deep belief networks , 2016, Expert Syst. Appl..

[7]  Gregory Ditzler,et al.  A Sequential Learning Approach for Scaling Up Filter-Based Feature Subset Selection , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Cong Jin,et al.  Gene selection approach based on improved swarm intelligent optimisation algorithm for tumour classification. , 2016, IET systems biology.

[9]  Lei Ma,et al.  A Novel Wrapper Approach for Feature Selection in Object-Based Image Classification Using Polygon-Based Cross-Validation , 2017, IEEE Geoscience and Remote Sensing Letters.

[10]  Mustafa Kaya,et al.  The effects of a new selection operator on the performance of a genetic algorithm , 2011, Appl. Math. Comput..

[11]  Vinod Kumar Jain,et al.  An improved Binary Particle Swarm Optimization (iBPSO) for Gene Selection and Cancer Classification using DNA Microarrays , 2018, 2018 Conference on Information and Communication Technology (CICT).

[12]  James A. Bartholomai,et al.  Prediction of lung cancer patient survival via supervised machine learning classification techniques , 2017, Int. J. Medical Informatics.

[13]  Yukyee Leung,et al.  A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Xin-She Yang,et al.  BBA: A Binary Bat Algorithm for Feature Selection , 2012, 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images.

[15]  Haifa Ben Saber,et al.  DNA Microarray Data Analysis: A New Survey on Biclustering , 2014 .

[16]  Zhong Yan,et al.  Ant Colony Optimization for Feature Selection in Face Recognition , 2004, ICBA.

[17]  Tao Li,et al.  Cost-sensitive feature selection using random forest: Selecting low-cost subsets of informative features , 2016, Knowl. Based Syst..

[18]  José Luís Oliveira,et al.  geneCommittee: a web-based tool for extensively testing the discriminatory power of biologically relevant gene sets in microarray data classification , 2014, BMC Bioinformatics.

[19]  Li Zhang,et al.  Similarity-balanced discriminant neighbor embedding and its application to cancer classification based on gene expression data , 2015, Comput. Biol. Medicine.

[20]  Dervis Karaboga,et al.  A comprehensive survey: artificial bee colony (ABC) algorithm and applications , 2012, Artificial Intelligence Review.

[21]  Pintu Chandra Shill,et al.  Incorporating gene ontology into fuzzy relational clustering of microarray gene expression data , 2018, Biosyst..

[22]  Saeid Nahavandi,et al.  Modified AHP for Gene Selection and Cancer Classification Using Type-2 Fuzzy Logic , 2016, IEEE Transactions on Fuzzy Systems.

[23]  Mehrbakhsh Nilashi,et al.  A knowledge-based system for breast cancer classification using fuzzy logic method , 2017, Telematics Informatics.

[24]  Y-h. Taguchi,et al.  Principal component analysis based unsupervised feature extraction applied to budding yeast temporally periodic gene expression , 2016, BioData Mining.

[25]  Yixin Chen,et al.  Efficient ant colony optimization for image feature selection , 2013, Signal Process..

[26]  Ehsan Lotfi,et al.  Gene expression microarray classification using PCA-BEL , 2014, Comput. Biol. Medicine.

[27]  Jing Li,et al.  Detecting gene-gene interactions using a permutation-based random forest method , 2016, BioData Mining.

[28]  Mengjie Zhang,et al.  A Particle Swarm Optimisation Based Multi-objective Filter Approach to Feature Selection for Classification , 2012, PRICAI.

[29]  Tim W. Nattkemper,et al.  A Normalized Tree Index for identification of correlated clinical parameters in microarray experiments , 2011, BioData Mining.

[30]  M. Shokouhifar,et al.  A Hybrid Approach for Effective Feature Selection using Neural Networks and Artificial Bee Colony Optimization , 2022 .

[31]  Beatriz A. Garro,et al.  Classification of DNA microarrays using artificial neural networks and ABC algorithm , 2016, Appl. Soft Comput..

[32]  C. Devi Arockia Vanitha,et al.  Gene Expression Data Classification Using Support Vector Machine and Mutual Information-based Gene Selection☆ , 2015 .

[33]  Pablo Guillen,et al.  Cancer Classification Based on Microarray Gene Expression Data Using Deep Learning , 2016, 2016 International Conference on Computational Science and Computational Intelligence (CSCI).

[34]  Genevera I. Allen,et al.  Identifying cancer biomarkers through a network regularized Cox model , 2013, 2013 IEEE International Workshop on Genomic Signal Processing and Statistics.

[35]  Wei Du,et al.  Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines , 2003, FEBS letters.

[36]  Ghada Hany Badr,et al.  Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification , 2015, Comput. Biol. Chem..

[37]  Xunbo Shuai,et al.  A Genetic Algorithm Based on Combination Operators , 2011 .

[38]  Yanchun Liang,et al.  A feature selection method based on multiple kernel learning with expression profiles of different types , 2017, BioData Mining.

[39]  R. Kirubakaran A Survey on Data Mining in Big Data , 2016 .

[40]  Vinod Kumar Jain,et al.  Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification , 2018, Appl. Soft Comput..

[41]  Shutao Li,et al.  Gene Selection Using Wilcoxon Rank Sum Test and Support Vector Machine for Cancer Classification , 2007, CIS.

[42]  Ali Najafi,et al.  A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata , 2017 .

[43]  D. Devaraj,et al.  A Combined Clustering and Ranking Based Gene Selection Algorithm for Microarray Data Classification , 2017, 2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC).

[44]  Vince D. Calhoun,et al.  Integrated Analysis of Gene Expression and Copy Number Data on Gene Shaving Using Independent Component Analysis , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.