The IDB and IEDB: intron sequence and evolution databases

A non-redundant database of nuclear, protein-encoding, genomic DNA sequences highlighting nuclear pre-mRNA introns was constructed using information contained in the SWISS-PROT and GenBank sequence databases. This Intron DataBase (IDB) contains information about (i) introns (including nucleotide sequence, location, phase, length, GC content and consensus-sequence rule violations), (ii) exons (including nucleo-tide sequence, length and GC content), (iii) protein coding regions (including amino acid sequence and length), and (iv) descriptive information about the source gene and organism (including gene designations and species taxonomy). The Intron Evolution DataBase (IEDB) provides a statistical analysis of the exon and intron sequences catalogued in IDB as well as data concerning intron penetration (relative number of coding regions with introns), density (number of introns per kb of total coding sequence DNA), distribution, and consensus sequences for each species present in IDB. This supplement is provided to furnish insights into the phylogenetic distribution and evolution of introns. Both databases are extensively cross-referenced to the SWISS-PROT and GenBank databases. IDB currently contains information on over 63 000 genes and 154 000 introns; IEDB summarizes information on over 2800 species. IDB and IEDB will be updated twice a year and are available via the internet (http://nutmeg.bio.indiana. edu/intron/index.html ).