A hybrid filter/wrapper approach of feature selection for gene expression data

In recent years, many studies have shown that microarray gene expression data is useful for disease identification and cancer classification. However, since gene expression data may contain thousands of genes simultaneously, successful microarray classification can be rather difficult. Feature (gene) selection is a frequently used pre-processing technology for successful classification of microarray gene expression data. Selecting a useful gene subset as a classifier not only decreases the computational time and cost, but also increases the classification accuracy. It is therefore imperative to extract only a small number of genes, which are exclusively relevant for the classification of a particular cancer/disease type. In this paper, correlation-based binary particle swarm optimizations is proposed to select the relevant genes, and a K-nearest neighbor with the leave-one-out cross-validation method serves as a classifier to evaluate the classification performance on six published cancer classification data sets. The experimental results show that the proposed method selects fewer gene subsets, while still resulting in higher prediction accuracy than the other literature methods.

[1]  Hong-Wen Deng,et al.  Gene selection for classification of microarray data based on the Bayes error , 2007, BMC Bioinformatics.

[2]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[3]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[4]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[5]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[6]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[7]  Hongbin Zhang,et al.  Feature selection using tabu search method , 2002, Pattern Recognit..

[8]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[9]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[10]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[11]  A. Stacey,et al.  Particle swarm optimization with mutation , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..