Dynamic Integration of Gene Annotation and its Application to Microarray Analysis

Comprehensive and structured annotations for all genes on a microarray chip are essential for the interpretation of its expression data. Currently, most chip gene annotations are one-line free text descriptions that are often partial, outdated and unsuitable for large-scale data analysis. Therefore the interpretation of microarray gene expression clusters is often limited. Although researchers can manually navigate a collection of databases for better annotations, it is only practical for limited number of genes. Existing meta-databases fail to provide comprehensive categorized annotations for hundreds of genes simultaneously. We have developed an automatic system to address this issue. GeneView system monitors various data sources, extracts gene information from a source whenever it is updated, comprehensively matches genes, and integrates them into a central database by categories, such as pathway, genetic mapping, phenotype, expression profile, domain structure, protein interaction, disease association, and references. The system consists of four major components: (1) relational database; (2) data processing; (3) user curation; (4) data query. We evaluated it by analyzing genes on cDNA and Affymetrix Oligo chips. In both cases, the system provided more accurate and comprehensive information than those provided by the vendors or the chip users, and helped identify new common functions among genes in the same expression clusters.

[1]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[2]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[3]  P M Nadkarni,et al.  QAV: querying entity-attribute-value metadata in a biomedical database. , 1997, Computer methods and programs in biomedicine.

[4]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[5]  G. Schuler Pieces of the puzzle: expressed sequence tags and the catalog of human genes , 1997, Journal of Molecular Medicine.

[6]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[7]  J. Barrett,et al.  Application of complementary DNA microarray technology to carcinogen identification, toxicology, and drug safety evaluation. , 1999, Cancer research.

[8]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[9]  James I. Garrels,et al.  Annotating the human proteome: the Human Proteome Survey Database (HumanPSDTM) and an in-depth target database for G protein-coupled receptors (GPCR-PDTM) from Incyte Genomics , 2002, Nucleic Acids Res..

[10]  K. Katz,et al.  Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. , 2000, Trends in genetics : TIG.

[11]  Sean R. Eddy,et al.  The Distributed Annotation System , 2001, BMC Bioinformatics.

[12]  T. Bedilion,et al.  The integration of microarray information in the drug development process. , 1998, Current opinion in biotechnology.

[13]  Donna R. Maglott,et al.  NCBI's LocusLink and RefSeq , 2000, Nucleic Acids Res..

[14]  D. Lancet,et al.  GeneCards: integrating information about genes, proteins and diseases. , 1997, Trends in genetics : TIG.

[15]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[16]  Scott A. Rifkin,et al.  Microarray analysis of Drosophila development during metamorphosis. , 1999, Science.