Support Vector Machine Classi cation ofMicroarray Gene Expression DataUCSC-CRL-99-09

We introduce a new method of functionally classifying genes using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). We describe SVMs that use different similarity metrics including a simple dot product of gene expression vectors, polynomial versions of the dot product, and a radial basis function. Compared to the other SVM similarity metrics, the radial basis function SVM appears to provide superior performance in identifying sets of genes with a common function using expression data. In addition, SVM performance is compared to four standard machine learning algorithms. SVMs have many features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature sp aces, and the ability to identify outliers.

[1]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[2]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[3]  T. Lithgow,et al.  The yeast nascent polypeptide-associated complex initiates protein targeting to mitochondria in vivo. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  M. Glickman,et al.  Copyright © 1998, American Society for Microbiology The Regulatory Particle of the Saccharomyces cerevisiae Proteasome , 1997 .

[6]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[8]  Nello Cristianini,et al.  Further results on the margin distribution , 1999, COLT '99.

[9]  F. R. Papa,et al.  Interaction of the Doa4 deubiquitinating enzyme with the yeast 26S proteasome. , 1999, Molecular biology of the cell.

[10]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[11]  I. Wool,et al.  Structure and evolution of mammalian ribosomal proteins. , 1995, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[12]  Isabelle Guyon,et al.  Discovering Informative Patterns and Data Cleaning , 1996, Advances in Knowledge Discovery and Data Mining.

[13]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14]  I. Ota,et al.  A Proteolytic Pathway That Recognizes Ubiquitin as a Degradation Signal (*) , 1995, The Journal of Biological Chemistry.

[15]  J. Woolford,et al.  Increased expression of Saccharomyces cerevisiae translation elongation factor 1 alpha bypasses the lethality of a TEF5 null allele encoding elongation factor 1 beta. , 1995, Genetics.

[16]  L. Grivell,et al.  Isolation and inactivation of the nuclear gene encoding the rotenone-insensitive internal NADH: ubiquinone oxidoreductase of mitochondria from Saccharomyces cerevisiae. , 1991, European journal of biochemistry.

[17]  M. Fujimuro,et al.  Son1p is a component of the 26S proteasome of the yeast Saccharomyces cerevisiae , 1998, FEBS letters.

[18]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[19]  田村 藤嗣彦,et al.  C-4-5 屈折率分布型ポリマー光ファイバーレーザーの発振特性 , 1999 .

[20]  John Shawe-Taylor,et al.  Large Margin Decision Trees for Induction and Transduction , 1999, ICML 1999.

[21]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[22]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[23]  S. Stoler,et al.  A mutation in CSE4, an essential gene encoding a novel chromatin-associated protein in yeast, causes chromosome nondisjunction and cell cycle arrest at mitosis. , 1995, Genes & development.

[24]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[25]  P. Brown,et al.  Yeast microarrays for genome wide parallel genetic and gene expression analysis. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[27]  C. Berg,et al.  Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions , 1984 .

[28]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[30]  T. Kitajima-Ihara,et al.  Rotenone‐insensitive internal NADH‐quinone oxidoreductase of Saccharomyces cerevisiae mitochondria: the enzyme expressed in Escherichia coli acts as a member of the respiratory chain in the host cells , 1998, FEBS letters.

[31]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[32]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[33]  David Haussler,et al.  Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[34]  F. Dick,et al.  QSR1, an Essential Yeast Gene with a Genetic Relationship to a Subunit of the Mitochondrial Cytochromebc 1 Complex, Codes for a 60 S Ribosomal Subunit Protein* , 1997, The Journal of Biological Chemistry.