VMD: a community annotation database for oomycetes and microbial genomes

The VBI Microbial Database (VMD) is a database system designed to host a range of microbial genome sequences. At present, the database contains genome sequence and annotation data of two plant pathogens Phytophthora sojae and Phytophthora ramorum. With the completion of the draft genome sequences of these pathogens in collaboration with the DOE Joint Genome Institute (JGI), we have created this resource to make the sequences publicly available. The genome sequences (95 MB for P.sojae and 65 MB for P.ramorum) were annotated with ∼19 000 and ∼16 000 gene models, respectively. We used two different statistical methods to validate these gene models, Fickett's and a log-likelihood method. Functional annotation of the gene models is based on results from BlastX and InterProScan screens. From the InterProScan results, we could assign putative functions to 17 694 genes in P.sojae and 14 700 genes in P.ramorum. We created an easy-to-use genome browser to view the genome sequence data, which opens to detailed annotation pages for each gene model. A community annotation interface is available for registered community members to add or edit annotations. There are ∼ 1600 gene models for P.sojae and ∼700 models for P.ramorum that have already been manually curated. A toolkit is provided as an additional resource for users to perform a variety of sequence analysis jobs. The database is publicly available at .

[1]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[2]  Terri K. Attwood,et al.  PRINTS and its automatic supplement, prePRINTS , 2003, Nucleic Acids Res..

[3]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[4]  Sébastien Carrère,et al.  The ProDom database of protein domain families: more emphasis on 3D , 2004, Nucleic Acids Res..

[5]  Cathy H. Wu,et al.  InterPro, progress and status in 2005 , 2004, Nucleic Acids Res..

[6]  A. D. McLachlan,et al.  A method for measuring the non-random bias of a codon usage table. , 1984, Nucleic acids research.

[7]  V. Solovyev,et al.  Ab initio gene finding in Drosophila genomic DNA. , 2000, Genome research.

[8]  Peer Bork,et al.  Recent improvements to the SMART domain-based sequence annotation resource , 2002, Nucleic Acids Res..

[9]  S. Lewis,et al.  The generic genome browser: a building block for a model organism system database. , 2002, Genome research.

[10]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[11]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[12]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[13]  Val Tannen,et al.  K2/Kleisli and GUS: Experiments in integrated access to genomic data sources , 2001, IBM Syst. J..

[14]  Amos Bairoch,et al.  Recent improvements to the PROSITE database , 2004, Nucleic Acids Res..

[15]  R. Durbin,et al.  GeneWise and Genomewise. , 2004, Genome research.