A novel BPSO approach for gene selection and classification of microarray data

Selecting relevant genes from microarray data poses a huge challenge due to the high-dimensionality of the features, multi-class categories and a relatively small sample size. The main task of the classification process is to decrease the microarray data dimensionality. In order to analyze microarray data, an optimal subset of features (genes) which adequately represents the original set of features has to be found. In this study, we used a novel binary particle swarm optimization (NBPSO) algorithm to perform microarray data selection and classification. The K-nearest neighbor (K-NN) method with leave-one-out cross-validation (LOOCV) served as a classifier. The experimental results showed that the proposed method not only effectively reduced the number of gene expression levels, but also achieved lower classification error rates.

[1]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[2]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[3]  Songbo Tan,et al.  An effective refinement strategy for KNN text classifier , 2006, Expert Syst. Appl..

[4]  Ahmed Bouridane,et al.  Simultaneous feature selection and feature weighting using Hybrid Tabu Search/K-nearest neighbor classifier , 2007, Pattern Recognit. Lett..

[5]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[6]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[7]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[8]  S. Jeffrey,et al.  Genomics-based prognosis and therapeutic prediction in breast cancer. , 2005, Journal of the National Comprehensive Cancer Network : JNCCN.

[9]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[10]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Werner Dubitzky,et al.  Instance-based concept learning from multiclass DNA microarray data , 2005, BMC Bioinformatics.

[12]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[13]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Zixiang Xiong,et al.  Optimal number of features as a function of sample size for various classification rules , 2005, Bioinform..

[15]  Shinn-Ying Ho,et al.  Selecting a minimal number of relevant genes from microarray data to design accurate tissue classifiers , 2007, Biosyst..

[16]  Alan F. Murray,et al.  International Joint Conference on Neural Networks , 1993 .

[17]  J. M. Deutsch,et al.  Evolutionary algorithms for finding optimal gene sets in microarray prediction , 2003, Bioinform..

[18]  Hui-Ling Huang,et al.  ESVM: Evolutionary support vector machine for automatic feature selection and classification of microarray data , 2007, Biosyst..

[19]  Susan G Hilsenbeck,et al.  The promise of microarrays in the management and treatment of breast cancer , 2005, Breast Cancer Research.

[20]  E. Lander,et al.  A molecular signature of metastasis in primary solid tumors , 2003, Nature Genetics.

[21]  T. Sørlie,et al.  Genomics in breast cancer—therapeutic implications , 2005, Nature Clinical Practice Oncology.

[22]  Xiaoxing Liu,et al.  An Entropy-based gene selection method for cancer classification using microarray data , 2005, BMC Bioinformatics.

[23]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[24]  James Kennedy,et al.  Particle swarm optimization , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[25]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[26]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[27]  Jae Won Lee,et al.  An extensive comparison of recent classification tools applied to microarray data , 2004, Comput. Stat. Data Anal..