Network-based support vector machine for classification of microarray samples

BackgroundThe importance of network-based approach to identifying biological markers for diagnostic classification and prognostic assessment in the context of microarray data has been increasingly recognized. To our knowledge, there have been few, if any, statistical tools that explicitly incorporate the prior information of gene networks into classifier building. The main idea of this paper is to take full advantage of the biological observation that neighboring genes in a network tend to function together in biological processes and to embed this information into a formal statistical framework.ResultsWe propose a network-based support vector machine for binary classification problems by constructing a penalty term from the F∞-norm being applied to pairwise gene neighbors with the hope to improve predictive performance and gene selection. Simulation studies in both low- and high-dimensional data settings as well as two real microarray applications indicate that the proposed method is able to identify more clinically relevant genes while maintaining a sparse model with either similar or higher prediction accuracy compared with the standard and the L1 penalized support vector machines.ConclusionThe proposed network-based support vector machine has the potential to be a practically useful classification tool for microarrays and other high-dimensional data.

[1]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[2]  E. Boerwinkle,et al.  Feature (gene) selection in gene expression-based tumor classification. , 2001, Molecular genetics and metabolism.

[3]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[4]  Michael I. Jordan,et al.  Discussion of Boosting Papers , 2003 .

[5]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[6]  L. Ein-Dor,et al.  Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  A. Godwin,et al.  Differential Gene Expression Analysis by DNA Microarray Technology and Its Application in Molecular Oncology , 2003, Molecular Biology.

[10]  Bernhard Schölkopf,et al.  GACV for Support Vector Machines , 2000 .

[11]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[12]  Hansong Zhang,et al.  Gacv for support vector machines , 2000 .

[13]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[14]  Wei Pan,et al.  Predictor Network in Penalized Regression with Application to Microarray Data” , 2009 .

[15]  J. Growdon,et al.  Molecular markers of early Parkinson's disease based on gene expression in blood , 2007, Proceedings of the National Academy of Sciences.

[16]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[17]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[18]  H. Zou,et al.  The F ∞ -norm support vector machine , 2008 .

[19]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[20]  S. Kasif,et al.  Network-Based Analysis of Affected Biological Processes in Type 2 Diabetes Models , 2007, PLoS genetics.

[21]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[22]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[23]  H. Zou,et al.  The doubly regularized support vector machine , 2006 .

[24]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.