Bi-stage hierarchical selection of pathway genes for cancer progression using a swarm based computational approach

Abstract Background Understanding of molecular mechanism, lying beneath the carcinogenic expression, is very essential for early and accurate detection of the disease. It predicts various types of irregularities and results in effective drug selection for the treatment. Pathway information plays an important role in mapping of genotype information to phenotype parameters. It helps to find co-regulated gene groups whose collective expression is strongly associated with the cancer development. Method In this paper, we have proposed a bi-stage hierarchical swarm based gene selection technique which combines two methods, proposed in this paper for the first time. First one is a multi-fitness discrete particle swarm optimization (MFDPSO) based feature selection procedure, having multiple fitness functions. This technique uses multi-filtering based gene selection procedure. On top of it, a new blended Laplacian artificial bee colony algorithm (BLABC) is proposed and it is used for automatic clustering of the selected genes obtained from the first procedure. We have performed 10 times 10-fold cross validation and compared our proposed method with various statistical and swarm based gene selection techniques for different popular cancer datasets. Result Experimental results show that the proposed method as a whole performs significantly well. The MFDPSO based system in combination with BLABC generates a good subset of pathway markers which provides more effective insight into the gene-disease association with high accuracy and reliability.

[1]  Dervis Karaboga,et al.  Artificial Bee Colony (ABC) Optimization Algorithm for Solving Constrained Optimization Problems , 2007, IFSA.

[2]  Mohammad Saniee Abadeh,et al.  Gene selection for cancer tumor detection using a novel memetic algorithm with a multi-view fitness function , 2013, Eng. Appl. Artif. Intell..

[3]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[4]  Jorge M. Arevalillo,et al.  Exploring correlations in gene expression microarray data for maximum predictive-minimum redundancy biomarker selection and classification , 2013, Comput. Biol. Medicine.

[5]  Shuichi Tsutsumi,et al.  Global gene expression analysis of gastric cancer by oligonucleotide microarrays. , 2002, Cancer research.

[6]  Michael N. Vrahatis,et al.  Particle Swarm Optimization Method for Constrained Optimization Problems , 2002 .

[7]  Enrique Alba,et al.  Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments , 2016, Appl. Soft Comput..

[8]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[9]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[10]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[11]  Spiridon D. Likothanassis,et al.  YamiPred: A Novel Evolutionary Method for Predicting Pre-miRNAs and Selecting Relevant Features , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Gamal Attiya,et al.  Classification of human cancer diseases by gene expression profiles , 2017, Appl. Soft Comput..

[13]  Anirban Mukhopadhyay,et al.  A Survey and Comparative Study of Statistical Tests for Identifying Differential Expression from Microarray Data , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Qing Wang,et al.  Towards precise classification of cancers based on robust gene functional expression profiles , 2005, BMC Bioinformatics.

[15]  Xiaohui Liu,et al.  Information Visualization for DNA Microarray Data Analysis: A Critical Review , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Bijan Bihari Misra,et al.  Pipelining the ranking techniques for microarray data classification: A case study , 2016, Appl. Soft Comput..

[17]  Sanghamitra Bandyopadhyay,et al.  Gene expression data clustering using a multiobjective symmetry based clustering technique , 2013, Comput. Biol. Medicine.

[18]  Wengang Zhou,et al.  A novel class dependent feature selection method for cancer biomarker discovery , 2014, Comput. Biol. Medicine.

[19]  Kusum Deep,et al.  Performance of Laplacian Biogeography-Based Optimization Algorithm on CEC 2014 continuous optimization benchmarks and camera calibration problem , 2016, Swarm Evol. Comput..

[20]  Doheon Lee,et al.  Inferring Pathway Activity toward Precise Disease Classification , 2008, PLoS Comput. Biol..

[21]  Anirban Mukhopadhyay,et al.  Identifying Non-Redundant Gene Markers from Microarray Data: A Multiobjective Variable Length PSO-Based Approach , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Ricardo J. G. B. Campello,et al.  Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Xiaolong Wang,et al.  Using distances between Top-n-gram and residue pairs for protein remote homology detection , 2014, BMC Bioinformatics.

[24]  Ujjwal Maulik,et al.  Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning , 2014, IEEE Journal of Translational Engineering in Health and Medicine.

[25]  Tao Han,et al.  Microarray scanner calibration curves: characteristics and implications , 2005, BMC Bioinformatics.

[26]  Junfeng Xia,et al.  Identification of mutated driver pathways in cancer using a multi-objective optimization model , 2016, Comput. Biol. Medicine.

[27]  E. Dougherty,et al.  Accurate and Reliable Cancer Classification Based on Probabilistic Inference of Pathway Activity , 2009, PloS one.

[28]  Wei-Chung Cheng,et al.  Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm , 2014, BMC Bioinformatics.

[29]  Jieping Ye,et al.  Using uncorrelated discriminant analysis for tissue classification with gene expression data , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  Sung-Bae Cho,et al.  Cancer classification using ensemble of neural networks with multiple significant gene subsets , 2007, Applied Intelligence.

[31]  Yuh-Min Chen,et al.  Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method , 2011, Expert Syst. Appl..

[32]  Heather J. Ruskin,et al.  Techniques for clustering gene expression data , 2008, Comput. Biol. Medicine.

[33]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[34]  Jae K. Lee,et al.  Utilizing the molecular gateway: the path to personalized cancer management. , 2009, Clinical chemistry.

[35]  Kenneth H. Buetow,et al.  Identification of Key Processes Underlying Cancer Phenotypes Using Biologic Pathway Analysis , 2007, PloS one.

[36]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[37]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[38]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[39]  David G. Stork,et al.  Pattern Classification , 1973 .

[40]  L. Alberto Hernández Montiel,et al.  Hybrid Framework Using Multiple-Filters and an Embedded Approach for an Efficient Selection and Classification of Microarray Data , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[42]  Michael R. Kosorok,et al.  Identification of differential gene pathways with principal component analysis , 2009, Bioinform..

[43]  J. Downing,et al.  Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells , 2003, Nature Genetics.

[44]  T. Aruldoss Albert Victoire,et al.  Hybrid Ant Bee Algorithm for Fuzzy Expert System Based Sample Classification , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[45]  Hong Peng,et al.  Improving the Computational Efficiency of Recursive Cluster Elimination for Gene Selection , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[46]  Ujjwal Maulik,et al.  Fuzzy Preference Based Feature Selection and Semisupervised SVM for Cancer Classification , 2014, IEEE Transactions on NanoBioscience.

[47]  Guy N. Brock,et al.  Empirical evaluation of consistency and accuracy of methods to detect differentially expressed genes based on microarray data , 2014, Comput. Biol. Medicine.

[48]  David L. Masica,et al.  Correlation of somatic mutation and expression identifies genes important in human glioblastoma progression and survival. , 2011, Cancer research.

[49]  Jack Y. Yang,et al.  Partial Least Squares Based Dimension Reduction with Gene Selection for Tumor Classification , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[50]  D. Pani,et al.  A Device for Local or Remote Monitoring of Hand Rehabilitation Sessions for Rheumatic Patients , 2014, IEEE Journal of Translational Engineering in Health and Medicine.

[51]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Anirban Mukhopadhyay,et al.  A PSO-Based Approach for Pathway Marker Identification From Gene Expression Data , 2015, IEEE Transactions on NanoBioscience.