Whole-Genome Functional Classification of Genes by Latent Semantic Analysis on Microarray Data

Quantitative simultaneous monitoring of the expression levels of thousands of genes under various experimental conditions is now possible using microarray experiments. The resulting microarray data are very useful for elucidating the functional relationships among genes in the genomes. However, due to the experimental and biological nature of the data, wholegenome functional classification of genes on microarray data remains a challenging machine learning problem. In this paper, we introduce the application of latent semantic analysis (LSA) to microarray expression data for systematic, genome-wide functional classification of genes.In the LSA approach considered here, singular value decomposition is first applied as a dimensionreducing step on the gene expression data, followed by an unsupervised clustering procedure based on vector similarities in the truncated space. Functional classification is then conducted through calling by majority on each of the resulting gene clusters. Using this semi-supervised LSA approach on microarray data, we have performed systematic functional classification on the genes in the partially-annotated yeast genome, annotating more than 1,700 unknown genes into 40 distinct functional classes with promising results.

[1]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[2]  P. Törönen,et al.  Analysis of gene expression data using self‐organizing maps , 1999, FEBS letters.

[3]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[4]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[5]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[6]  Michael E. Wall,et al.  SVDMAN-singular value decomposition analysis of microarray data , 2001, Bioinform..

[7]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[8]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[9]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  David Botstein,et al.  Processing and modeling genome-wide expression data using singular value decomposition , 2001, SPIE BiOS.

[11]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[12]  David Horn,et al.  Novel Clustering Algorithm for Microarray Expression Data in A Truncated SVD Space , 2003, Bioinform..

[13]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[14]  E. Sprinzak,et al.  Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes. , 1999, Genome research.

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  M. Gerstein,et al.  Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. , 2002, Genome research.

[17]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.