Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE

Background:In class prediction problems using microarray data, gene selection is essential to improve the prediction accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVM-RFE) has become one of the leading methods and is being widely used. The SVM-based approach performs gene selection using the weight vector of the hyperplane constructed by the samples on the margin. However, the performance can be easily affected by noise and outliers, when it is applied to noisy, small sample size microarray data.Results:In this paper, we propose a recursive gene selection method using the discriminant vector of the maximum margin criterion (MMC), which is a variant of classical linear discriminant analysis (LDA). To overcome the computational drawback of classical LDA and the problem of high dimensionality, we present efficient and stable algorithms for MMC-based RFE (MMC-RFE). The MMC-RFE algorithms naturally extend to multi-class cases. The performance of MMC-RFE was extensively compared with that of SVM-RFE using nine cancer microarray datasets, including four multi-class datasets.Conclusion:Our extensive comparison has demonstrated that for binary-class datasets MMC-RFE tends to show intermediate performance between hard-margin SVM-RFE and SVM-RFE with a properly chosen soft-margin parameter. Notably, MMC-RFE achieves significantly better performance with a smaller number of genes than SVM-RFE for multi-class datasets. The results suggest that MMC-RFE is less sensitive to noise and outliers due to the use of average margin, and thus may be useful for biomarker discovery from noisy data.

[1]  S. Dhanasekaran,et al.  Delineation of prognostic biomarkers in prostate cancer , 2001, Nature.

[2]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[3]  M. Bittner,et al.  Human prostate cancer and benign prostatic hyperplasia: molecular dissection by gene expression profiling. , 2001, Cancer research.

[4]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[5]  Amnon Shashua,et al.  On the Relationship Between the Support Vector Machine for Classification and Sparsified Fisher's Linear Discriminant , 1999, Neural Processing Letters.

[6]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[7]  Jieping Ye,et al.  Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems , 2005, J. Mach. Learn. Res..

[8]  Jian Yang,et al.  Why can LDA be performed in PCA transformed space? , 2003, Pattern Recognit..

[9]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[10]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[11]  Golub Gene H. Et.Al Matrix Computations, 3rd Edition , 2007 .

[12]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[13]  Cor J. Veenman,et al.  A protocol for building and evaluating predictors of disease state based on microarray data , 2005, Bioinform..

[14]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[15]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[16]  Daoqiang Zhang,et al.  Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[17]  Xuegong Zhang,et al.  Kernel MSE algorithm: a unified framework for KFD, LS-SVM and KRR , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[18]  David G. Stork,et al.  Pattern Classification , 1973 .

[19]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[20]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[21]  Yiming Yang,et al.  Analysis of recursive gene selection approaches from microarray data , 2005, Bioinform..

[22]  Vojislav Kecman,et al.  Gene extraction for cancer diagnosis by support vector machines - An improvement , 2005, Artif. Intell. Medicine.

[23]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[24]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[25]  M. Čuperlović-Culf,et al.  Determination of tumour marker genes from gene expression data. , 2005, Drug discovery today.

[26]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[27]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[29]  G. Stolovitzky Gene selection in microarray data: the elephant, the blind men and our algorithms. , 2003, Current opinion in structural biology.

[30]  J. Welsh,et al.  Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. , 2001, Cancer research.

[31]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[33]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[34]  Jeffrey A. Magee,et al.  Expression profiling reveals hepsin overexpression in prostate cancer. , 2001, Cancer research.

[35]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[36]  Gene H. Golub,et al.  Matrix computations , 1983 .

[37]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[38]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[39]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.