Gene Selection Using Wilcoxon Rank Sum Test and Support Vector Machine for Cancer Classification

Gene selection is an important problem in microarray data processing. A new gene selection method based on Wilcoxon rank sum test and Support Vector Machine (SVM) is proposed in this paper. First, Wilcoxon rank sum test is used to select a subset. Then each selected gene is trained and tested using SVM classifier with linear kernel separately, and genes with high testing accuracy rates are chosen to form the final reduced gene subset. Leave-one-out cross validation (LOOCV) classification results on two datasets: Breast Cancer and ALL/AML leukemia, demonstrate the proposed method can get 100% success rate with the final reduced subset. The selected genes are listed and their expression levels are sketched, which show that the selected genes can make clear separation between two classes.

[1]  Lawrence Carin,et al.  Gene expression analysis: Joint feature selection and classifler design , 2004 .

[2]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[3]  F. Azuaje,et al.  Multiple SVM-RFE for gene selection in cancer classification with expression data , 2005, IEEE Transactions on NanoBioscience.

[4]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[5]  Limsoon Wong,et al.  Selection of patient samples and genes for outcome prediction , 2004 .

[6]  Jagath C. Rajapakse,et al.  A variant of SVM-RFE for gene selection in cancer classification with expression data , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[7]  Cesare Furlanello,et al.  An accelerated procedure for recursive feature ranking on microarray data , 2003, Neural Networks.

[8]  Yanqing Zhang,et al.  FCM-SVM-RFE Gene Feature Selection Algorithm for Leukemia Classification from Microarray Gene Expression Data , 2005, The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ '05..

[9]  Yi Li,et al.  Bayesian automatic relevance determination algorithms for classifying gene expression data. , 2002, Bioinformatics.

[10]  Xuegong Zhang,et al.  Recursive Sample Classification and Gene Selection based on SVM: Method and Software Description # , 2001 .

[11]  Bernhard Schölkopf,et al.  Gene Expression Analysis: Joint Feature Selection and Classifier Design , 2004 .

[12]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[14]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[15]  Anil K. Jain,et al.  Bayesian learning of sparse classifiers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[16]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[17]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .