Knowledge-based analysis of microarray gene expression data by using support vector machines.

We introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and self-organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. We test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, we use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.

[1]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  B. Bainbridge,et al.  Genetics , 1981, Experientia.

[5]  B. Gross,et al.  cDNA sequence coding for a translationally controlled human tumor protein. , 1989, Nucleic acids research.

[6]  L. Grivell,et al.  Isolation and inactivation of the nuclear gene encoding the rotenone-insensitive internal NADH: ubiquinone oxidoreductase of mitochondria from Saccharomyces cerevisiae. , 1991, European journal of biochemistry.

[7]  D. Herschlag,et al.  An RNA chaperone activity of non‐specific RNA binding proteins in hammerhead ribozyme catalysis. , 1994, The EMBO journal.

[8]  J. Woolford,et al.  Increased expression of Saccharomyces cerevisiae translation elongation factor 1 alpha bypasses the lethality of a TEF5 null allele encoding elongation factor 1 beta. , 1995, Genetics.

[9]  S. Stoler,et al.  A mutation in CSE4, an essential gene encoding a novel chromatin-associated protein in yeast, causes chromosome nondisjunction and cell cycle arrest at mitosis. , 1995, Genes & development.

[10]  I. Ota,et al.  A Proteolytic Pathway That Recognizes Ubiquitin as a Degradation Signal (*) , 1995, The Journal of Biological Chemistry.

[11]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[12]  I. Wool,et al.  Structure and evolution of mammalian ribosomal proteins. , 1995, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[13]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[14]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[15]  F. Dick,et al.  QSR1, an Essential Yeast Gene with a Genetic Relationship to a Subunit of the Mitochondrial Cytochromebc 1 Complex, Codes for a 60 S Ribosomal Subunit Protein* , 1997, The Journal of Biological Chemistry.

[16]  P. Brown,et al.  Yeast microarrays for genome wide parallel genetic and gene expression analysis. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[17]  T. Kitajima-Ihara,et al.  Rotenone‐insensitive internal NADH‐quinone oxidoreductase of Saccharomyces cerevisiae mitochondria: the enzyme expressed in Escherichia coli acts as a member of the respiratory chain in the host cells , 1998, FEBS letters.

[18]  T. Lithgow,et al.  The yeast nascent polypeptide-associated complex initiates protein targeting to mitochondria in vivo. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  M. Fujimuro,et al.  Son1p is a component of the 26S proteasome of the yeast Saccharomyces cerevisiae , 1998, FEBS letters.

[21]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[22]  D. Haft,et al.  Identification of a Family of Sorting Nexin Molecules and Characterization of Their Association with Receptors , 1998, Molecular and Cellular Biology.

[23]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[24]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  M. Glickman,et al.  Copyright © 1998, American Society for Microbiology The Regulatory Particle of the Saccharomyces cerevisiae Proteasome , 1997 .

[26]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[27]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[28]  F. R. Papa,et al.  Interaction of the Doa4 deubiquitinating enzyme with the yeast 26S proteasome. , 1999, Molecular biology of the cell.

[29]  I. Arnold,et al.  ATP Synthase of Yeast Mitochondria , 1999, The Journal of Biological Chemistry.

[30]  M. Saraste,et al.  FEBS Lett , 2000 .