An improved Binary Particle Swarm Optimization (iBPSO) for Gene Selection and Cancer Classification using DNA Microarrays

DNA Microarrays enable the detection of genetic changes attributable to cancer by simultaneously analyzing the expression of thousands of genes. However, the identification of most relevant genes from thousands of gene expressions available in each biological sample, for cancer classification pose a great challenge. Although researchers have applied BPSO based wrapper approaches to get most relevant genes prior to cancer classification, these approaches didn’t achieve good classification accuracy due to the premature convergence caused by local stagnation problem. This paper proposes an improved Binary Particle Swarm Optimization (iBPSO) to tackle these issues. The proposed iBPSO based wrapper is examined using Naive-Bayes (NB), k-Nearest Neighbor (kNN), and Support Vector Machines (SVM) classifiers with stratified 5-fold cross-validation. The proposed iBPSO exhibited its efficacy in terms of classification accuracy and the number of selected genes in comparison to standard BPSO on six benchmark cancer microarray datasets. Our proposed iBPSO also effectively escapes from local minima stagnation.

[1]  Qi Shen,et al.  Simultaneous genes and training samples selection by modified particle swarm optimization for gene expression data classification , 2009, Comput. Biol. Medicine.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Peng Zhou,et al.  A sequential feature extraction approach for naïve bayes classification of microarray data , 2009, Expert Syst. Appl..

[4]  Byoung-Tak Zhang,et al.  Applying Machine Learning Techniques to Analysis of Gene Expression Data: Cancer Diagnosis , 2002 .

[5]  Jin-Kao Hao,et al.  A Hybrid GA/SVM Approach for Gene Selection and Classification of Microarray Data , 2006, EvoWorkshops.

[6]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[7]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[8]  Fillia Makedon,et al.  Application of Relief-F feature filtering algorithm to selecting informative genes for cancer classification using microarray data , 2004 .

[9]  Wentian Li,et al.  How Many Genes are Needed for a Discriminant Microarray Data Analysis , 2001, physics/0104029.

[10]  V. Vapnik The Support Vector Method of Function Estimation , 1998 .

[11]  Salman Mohagheghi,et al.  Particle Swarm Optimization: Basic Concepts, Variants and Applications in Power Systems , 2008, IEEE Transactions on Evolutionary Computation.

[12]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[13]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[14]  Hong Zhou,et al.  Naive Bayesian classifier for microarray data , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[15]  Zexuan Zhu,et al.  Markov blanket-embedded genetic algorithm for gene selection , 2007, Pattern Recognit..

[16]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[17]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[18]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[19]  Kun-Huang Chen,et al.  Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data , 2014, Appl. Soft Comput..