Hybrid huberized support vector machines for microarray classification

The large number of genes and the relatively small number of samples are typical characteristics for microarray data. These characteristics pose challenges for both sample classification and relevant gene selection. The support vector machine (SVM) is a widely used classification technique, and previous studies have demonstrated its superior classification performance in microarray analysis. However, a major limitation is that the SVM can not perform automatic gene selection. To overcome this limitation, we propose the hybrid huberized support vector machine (HHSVM). The HHSVM uses the huberized hinge loss function and the elastic-net penalty. It has two major benefits: 1. automatic gene selection; 2. the grouping effect, where highly correlated genes tend to be selected/removed together. We also develop an efficient algorithm that computes the entire regularized solution path for HHSVM. We have applied our method to real microarray data and achieved promising results.

[1]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[2]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[3]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[4]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[5]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[7]  Bernhard Schölkopf,et al.  Regularization Networks and Support Vector Machines , 2000 .

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  Jill P. Mesirov,et al.  Support Vector Machine Classification of Microarray Data , 2001 .

[10]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[11]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[12]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[14]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[15]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[16]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[17]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[18]  H. Zou,et al.  Addendum: Regularization and variable selection via the elastic net , 2005 .

[19]  S. Mukherjee,et al.  A genomic strategy to refine prognosis in early-stage non-small-cell lung cancer. , 2006, The New England journal of medicine.

[20]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.