Classification of breast cancer using microarray gene expression data: A survey

Cancer, in particular breast cancer, is considered one of the most common causes of death worldwide according to the world health organization. For this reason, extensive research efforts have been done in the area of accurate and early diagnosis of cancer in order to increase the likelihood of cure. Among the available tools for diagnosing cancer, microarray technology has been proven to be effective. Microarray technology analyzes the expression level of thousands of genes simultaneously. Although the huge number of features or genes in the microarray data may seem advantageous, many of these features are irrelevant or redundant resulting in the deterioration of classification accuracy. To overcome this challenge, feature selection techniques are a mandatory preprocessing step before the classification process. In the paper, the main feature selection and classification techniques introduced in the literature for cancer (particularly breast cancer) are reviewed to improve the microarray-based classification.

[1]  Lei Liu,et al.  Particle swarm optimization algorithm: an overview , 2017, Soft Computing.

[2]  Qing Wu,et al.  A Feature Selection Method Based on Hybrid Improved Binary Quantum Particle Swarm Optimization , 2019, IEEE Access.

[3]  Mohamed Hamed N. Taha,et al.  Breast and Colon Cancer Classification from Gene Expression Profiles Using Data Mining Techniques , 2020, Symmetry.

[4]  Ahmed El-Shafie,et al.  Application of artificial bee colony (ABC) algorithm in search of optimal release of Aswan High Dam , 2013 .

[5]  Satchidananda Dehuri,et al.  Feature selection model based on clustering and ranking in pipeline for microarray data , 2017 .

[6]  Faisal Saeed,et al.  Gene Selection and Classification in Microarray Datasets using a Hybrid Approach of PCC-BPSO/GA with Multi Classifiers , 2018, J. Comput. Sci..

[7]  Wael Khalifa,et al.  Comparative study for 8 computational intelligence algorithms for human identification , 2020, Comput. Sci. Rev..

[8]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[9]  P. Vadivu,et al.  A Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence , 2019, SSRN Electronic Journal.

[10]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[11]  Mohammed Azmi Al-Betar,et al.  A Hybrid Filter-Wrapper Gene Selection Method for Cancer Classification , 2018, 2018 2nd International Conference on BioSignal Analysis, Processing and Systems (ICBAPS).

[12]  Gamal Attiya,et al.  Classification of human cancer diseases by gene expression profiles , 2017, Appl. Soft Comput..

[13]  Abdelkader Benyettou,et al.  Kernel-based learning and feature selection analysis for cancer diagnosis , 2017, Appl. Soft Comput..

[14]  Mingquan Ye,et al.  Cancer Classification Based on Support Vector Machine Optimized by Particle Swarm Optimization and Artificial Bee Colony , 2017, Molecules.

[15]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[16]  Roshan G. Ragel,et al.  A Novel Filter-Wrapper Based Feature Selection Approach for Cancer Data Classification , 2018, 2018 IEEE International Conference on Information and Automation for Sustainability (ICIAfS).

[17]  Adiwijaya,et al.  A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest , 2018, J. Inf. Process. Syst..

[18]  Belal Zaqaibeh,et al.  Gene Microarray Cancer Classification using Correlation Based Feature Selection Algorithm and Rules Classifiers , 2019, Int. J. Online Biomed. Eng..

[19]  Tao Liu,et al.  Efficient feature selection and classification for microarray data , 2018, PloS one.

[20]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[21]  Swati Vipsita,et al.  Jaya Optimized Extreme Learning Machine for Breast Cancer Data Classification , 2021 .

[22]  Z. Rustam,et al.  Gene selection in cancer classification using hybrid method based on Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC) feature selection and support vector machine , 2019, PROCEEDINGS OF THE 4TH INTERNATIONAL SYMPOSIUM ON CURRENT PROGRESS IN MATHEMATICS AND SCIENCES (ISCPMS2018).

[24]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[25]  Adiwijaya,et al.  Implementation of mutual information and bayes theorem for classification microarray data , 2018 .

[26]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[27]  Swati Vipsita,et al.  An efficient approach for microarray data classification using filter wrapper hybrid approach , 2015, 2015 IEEE International Advance Computing Conference (IACC).

[28]  R. K. Agrawal,et al.  A hybrid of clustering and quantum genetic algorithm for relevant genes selection for cancer microarray data , 2016, Int. J. Knowl. Based Intell. Eng. Syst..

[29]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[30]  Won-Gun Koh,et al.  Cell Microarray Technologies for High-Throughput Cell-Based Biosensors , 2017, Sensors.

[31]  Jane You,et al.  Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Mohammad Sadegh Helfroush,et al.  A fuzzy multi-objective hybrid TLBO-PSO approach to select the associated genes with breast cancer , 2017, Signal Process..

[33]  Sansanee Auephanwiriyakul,et al.  Microarray data classification using neuro-fuzzy classifier with firefly algorithm , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[34]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[35]  H. Nugrahapraja,et al.  K-Nearest Neighbor (KNN) Analysis on Genes Expression Datasets of Maize Nested Association Mapping (NAM) Showed Confident Classification on Organ-specific Expression , 2018, 2018 1st International Conference on Bioinformatics, Biotechnology, and Biomedical Engineering - Bioinformatics and Biomedical Engineering.

[36]  Tolga Ensari,et al.  Microarray breast cancer data classification using machine learning methods , 2018, 2018 Electric Electronics, Computer Science, Biomedical Engineerings' Meeting (EBBT).

[37]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[38]  Nada Almugren,et al.  A Survey on Hybrid Feature Selection Methods in Microarray Gene Expression Data for Cancer Classification , 2019, IEEE Access.

[39]  Claudio De Stefano,et al.  An Experimental Comparison of Feature-Selection and Classification Methods for Microarray Datasets , 2019, Inf..

[40]  Ismail El Moudden,et al.  Decision Tree Model Based Gene Selection and Classification for Breast Cancer Risk Prediction , 2020, SADASC.

[41]  Qiang Su,et al.  A Cancer Gene Selection Algorithm Based on the K-S Test and CFS , 2017, BioMed research international.

[42]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[43]  Yu Xue,et al.  A hybrid feature selection algorithm for gene expression data classification , 2017, Neurocomputing.

[44]  E. Vardar,et al.  Molecular Classification of Breast Carcinoma: From Traditional, Old-Fashioned Way to A New Age, and A New Way. , 2015, The journal of breast health.

[45]  Vinod Kumar Jain,et al.  Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification , 2018, Appl. Soft Comput..

[46]  Sun Gang,et al.  Feature Selection Algorithm Based on Mutual Information and Lasso for Microarray Data , 2016 .

[47]  M. Makary,et al.  Medical error—the third leading cause of death in the US , 2016, British Medical Journal.

[48]  Antony Selvadoss Thanamani,et al.  Feature Selection Based on Information Gain , 2013 .

[49]  A. Jemal,et al.  Global cancer statistics, 2012 , 2015, CA: a cancer journal for clinicians.

[50]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[51]  Xin-She Yang,et al.  A New Metaheuristic Bat-Inspired Algorithm , 2010, NICSO.

[52]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[53]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.