Ontology annotation: mapping genomic regions to biological function.

With numerous whole genomes now in hand, and experimental data about genes and biological pathways on the increase, a systems approach to biological research is becoming essential. Ontologies provide a formal representation of knowledge that is amenable to computational as well as human analysis, an obvious underpinning of systems biology. Mapping function to gene products in the genome consists of two, somewhat intertwined enterprises: ontology building and ontology annotation. Ontology building is the formal representation of a domain of knowledge; ontology annotation is association of specific genomic regions (which we refer to simply as 'genes', including genes and their regulatory elements and products such as proteins and functional RNAs) to parts of the ontology. We consider two complementary representations of gene function: the Gene Ontology (GO) and pathway ontologies. GO represents function from the gene's eye view, in relation to a large and growing context of biological knowledge at all levels. Pathway ontologies represent function from the point of view of biochemical reactions and interactions, which are ordered into networks and causal cascades. The more mature GO provides an example of ontology annotation: how conclusions from the scientific literature and from evolutionary relationships are converted into formal statements about gene function. Annotations are made using a variety of different types of evidence, which can be used to estimate the relative reliability of different annotations.

[1]  Michael Ashburner,et al.  Assessment of genome-wide protein function classification for Drosophila melanogaster. , 2003, Genome research.

[2]  Eric M. Just,et al.  dictyBase, the model organism database for Dictyostelium discoideum , 2005, Nucleic Acids Res..

[3]  Christopher H Wade,et al.  The budding yeast rRNA and ribosome biosynthesis (RRB) regulon contains over 200 genes , 2006, Yeast.

[4]  Paramvir S. Dehal,et al.  A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs) database , 2006, BMC Bioinformatics.

[5]  Jungwon Yoon,et al.  The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community , 2003, Nucleic Acids Res..

[6]  S. Dwight,et al.  Predicting gene function from patterns of annotation. , 2003, Genome research.

[7]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[8]  L. Floridi Blackwell Guide to the Philosophy of Computing and Information , 2003 .

[9]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[10]  Nanfei Xu,et al.  Phylogenetic and expression analysis of sucrose phosphate synthase isozymes in plants. , 2007, Journal of plant physiology.

[11]  Kimberly Van Auken,et al.  WormBase: better software, richer content , 2005, Nucleic Acids Res..

[12]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[13]  J. Kutok,et al.  Notch Oncoproteins Depend on γ-Secretase/Presenilin Activity for Processing and Function* , 2004, Journal of Biological Chemistry.

[14]  C. Bult,et al.  Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs , 2006, PLoS genetics.

[15]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[16]  Wayne A. Decatur,et al.  Genome-wide searching for pseudouridylation guide snoRNAs: analysis of the Saccharomyces cerevisiae genome. , 2004, Nucleic acids research.

[17]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[18]  H. Klee,et al.  Characterization of three members of the Arabidopsis carotenoid cleavage dioxygenase family demonstrates the divergent roles of this multifunctional enzyme family. , 2006, The Plant journal : for cell and molecular biology.

[19]  Madeline A. Crosby,et al.  FlyBase: genes and gene models , 2004, Nucleic Acids Res..

[20]  Tao Liu,et al.  TreeFam: a curated database of phylogenetic trees of animal gene families , 2005, Nucleic Acids Res..

[21]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[22]  Nan Guo,et al.  PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways , 2006, Nucleic Acids Res..

[23]  Kara Dolinski,et al.  Genome Snapshot: a new resource at the Saccharomyces Genome Database (SGD) presenting an overview of the Saccharomyces cerevisiae genome , 2005, Nucleic Acids Res..

[24]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[25]  S. Salzberg,et al.  The Transcriptional Landscape of the Mammalian Genome , 2005, Science.

[26]  E. Birney,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Research.

[27]  Anushya Muruganujan,et al.  PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification , 2003, Nucleic Acids Res..

[28]  Peter D. Karp,et al.  EcoCyc: Encyclopedia of Escherichia coli genes and metabolism , 1998, Nucleic Acids Res..

[29]  Peter D. Karp,et al.  A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases , 2004, BMC Bioinformatics.

[30]  Peter D. Karp,et al.  EcoCyc: a comprehensive database resource for Escherichia coli , 2004, Nucleic Acids Res..

[31]  Ying Liu,et al.  Text Mining Biomedical Literature for Discovering Gene-to-Gene Relationships: A Comparative Study of Algorithms , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Jérôme Gouzy,et al.  ProDom: Automated Clustering of Homologous Domains , 2002, Briefings Bioinform..

[33]  Monte Westerfield,et al.  The Zebrafish Information Network: the zebrafish model organism database , 2005, Nucleic Acids Res..

[34]  Alfonso Valencia,et al.  Implementing the iHOP concept for navigation of biomedical literature , 2005, ECCB/JBI.

[35]  Wei Zhao,et al.  Gramene: a bird's eye view of cereal genomes , 2005, Nucleic Acids Res..

[36]  M. Samanta,et al.  Predicting protein functions from redundancies in large-scale protein interaction networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[37]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[38]  C. Ouzounis,et al.  Expansion of the BioCyc collection of pathway/genome databases to 160 genomes , 2005, Nucleic acids research.

[39]  Boris Hayete,et al.  GOTrees: Predicting GO Associations from Protein Domain Composition Using Decision Trees , 2004, Pacific Symposium on Biocomputing.

[40]  S. Eddy,et al.  A computational screen for methylation guide snoRNAs in yeast. , 1999, Science.

[41]  Steffen Schulze-Kremer,et al.  The Ontology of the Gene Ontology , 2003, AMIA.

[42]  Joanne S. Luciano,et al.  PAX of mind for pathway researchers. , 2005, Drug discovery today.

[43]  Gary D. Bader,et al.  Pathguide: a Pathway Resource List , 2005, Nucleic Acids Res..

[44]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): updates and enhancements , 2005, Nucleic Acids Res..

[45]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[46]  Hiroaki Kitano,et al.  The PANTHER database of protein families, subfamilies, functions and pathways , 2004, Nucleic Acids Res..

[47]  Tamás Kiss,et al.  Site-Specific Ribose Methylation of Preribosomal RNA: A Novel Function for Small Nucleolar RNAs , 1996, Cell.

[48]  Peter J. Tonellato,et al.  The Rat Genome Database (RGD): developments towards a phenome database , 2004, Nucleic Acids Res..

[49]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[50]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[51]  S. Shigeoka,et al.  Comprehensive Analysis of Cytosolic Nudix Hydrolases in Arabidopsis thaliana* , 2005, Journal of Biological Chemistry.