Resources for functional annotation

The continued success of genome sequencing projects has led to an explosion in the availability of sequence data. The Genomes On Line Database (GOLD) currently lists more than 2000 ongoing and completed genome projects, and this number is continuously increasing (Liolios et al. 2008). In order for this sequence information to be useful in the formulation and testing of biological hypotheses, these genome sequences must be adequately annotated.

[1]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[2]  Hiren J. Joshi,et al.  GlycoSuiteDB: a curated relational database of glycoprotein glycan structures and their biological sources. 2003, update , 2003, Nucleic Acids Res..

[3]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[4]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[5]  W. Wasserman,et al.  GeneLynx: a gene-centric portal to the human genome. , 2001, Genome research.

[6]  John S Garavelli,et al.  The RESID Database of Protein Modifications as a resource and annotation tool , 2004, Proteomics.

[7]  Peer Bork,et al.  SMART 5: domains in the context of genomes and networks , 2005, Nucleic Acids Res..

[8]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[9]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[10]  Nan Guo,et al.  PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways , 2006, Nucleic Acids Res..

[11]  Michelle G. Giglio,et al.  TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes , 2006, Nucleic Acids Res..

[12]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[13]  Masaru Tomita,et al.  Restauro-G: A Rapid Genome Re-Annotation System for Comparative Genomics , 2007, Genom. Proteom. Bioinform..

[14]  I-Min A. Chen,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[15]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[16]  Hideaki Sugawara,et al.  DNA Data Bank of Japan (DDBJ) in XML , 2003, Nucleic Acids Res..

[17]  Terri K. Attwood,et al.  PRINTS and its automatic supplement, prePRINTS , 2003, Nucleic Acids Res..

[18]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..

[19]  Tsviya Olender,et al.  GeneCardsTM 2002: towards a complete, object-oriented, human gene compendium , 2002, Bioinform..

[20]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[21]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[22]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2007: families and functions , 2006, Nucleic Acids Res..

[23]  V. McKusick Mendelian Inheritance in Man and Its Online Version, OMIM , 2007, The American Journal of Human Genetics.

[24]  Robert D. Finn,et al.  New developments in the InterPro database , 2007, Nucleic Acids Res..

[25]  Rolf Apweiler,et al.  Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT , 2001, Bioinform..

[26]  Alexander V. Diemand,et al.  The Swiss‐Prot variant page and the ModSNP database: A resource for sequence and structure information on human protein variants , 2004, Human mutation.

[27]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[28]  Anne-Lise Veuthey,et al.  Automated annotation of microbial proteomes in SWISS-PROT , 2003, Comput. Biol. Chem..

[29]  Torsten Schwede,et al.  The SWISS-MODEL Repository: new features and functionalities , 2005, Nucleic Acids Res..

[30]  Rolf Apweiler,et al.  A novel method for automatic functional annotation of proteins , 1999, Bioinform..

[31]  Sameer Velankar,et al.  E-MSD: improving data deposition and structure quality , 2005, Nucleic Acids Res..

[32]  Rolf Apweiler,et al.  UniProt archive , 2004, Bioinform..

[33]  Rolf Apweiler,et al.  InterProScan: protein domains identifier , 2005, Nucleic Acids Res..

[34]  C. Ouzounis,et al.  Expansion of the BioCyc collection of pathway/genome databases to 160 genomes , 2005, Nucleic acids research.

[35]  Sébastien Carrère,et al.  The ProDom database of protein domain families: more emphasis on 3D , 2004, Nucleic Acids Res..

[36]  Kimberly Van Auken,et al.  WormBase: new content and better access , 2006, Nucleic Acids Res..

[37]  Vincent Lombard,et al.  The EMBL Nucleotide Sequence Database: major new developments , 2003, Nucleic Acids Res..

[38]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms , 2004, Nucleic Acids Res..

[39]  Madeline A. Crosby,et al.  FlyBase: genomes by the dozen , 2006, Nucleic Acids Res..

[40]  M. Gribskov,et al.  [9] Profile analysis , 1990 .

[41]  Ingmar Reuter,et al.  Integr8 and Genome Reviews: integrated views of complete genomes and proteomes , 2004, Nucleic Acids Res..

[42]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[43]  Ron D. Appel,et al.  ExPASy: the proteomics server for in-depth protein knowledge and analysis , 2003, Nucleic Acids Res..

[44]  Robert S. Ledley,et al.  PIRSF: family classification system at the Protein Information Resource , 2004, Nucleic Acids Res..

[45]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[46]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[47]  M. Mann,et al.  Global, In Vivo, and Site-Specific Phosphorylation Dynamics in Signaling Networks , 2006, Cell.

[48]  Frances M. G. Pearl,et al.  The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution , 2006, Nucleic Acids Res..

[49]  David A. Lee,et al.  Gene3D: modelling protein structure, function and evolution , 2005, Nucleic Acids Res..