Xpro: database of eukaryotic protein-encoding genes

Xpro is a relational database that contains all the eukaryotic protein-encoding DNA sequences contained in GenBank with associated data required for the analysis of eukaryotic gene architecture. In addition to the information found in the GenBank records, which includes properties such as sequence, position, length and description about introns, exons and protein-coding regions, Xpro provides annotations on the splice sites and intron phases. Furthermore, Xpro validates intron positions using alignment information between the record's sequence and EST sequences found in dbEST. In the process of validation, alternative splicing information is also obtained and can be found in the database. The intron-containing genes in the Xpro are also classified as experimental or predicted based on the intron position validation and specific keywords in the GenBank records that are present in predicted genes. An Entrez-like query system, which is familiar to most biologists, is provided for accessing the information present in the database system. A non-redundant set of Xpro database contents is also obtained by cross-referencing to the Swiss-Prot/TrEMBL and Pfam databases. The database currently contains information for 493,983 genes--351,918 intron- containing genes and 142,065 intron-less genes. Xpro is updated for each new GenBank release and is freely available via the internet at http://origin.bic. nus.edu.sg/xpro.

[1]  Meena Kishore Sakharkar,et al.  SEGE: A database on 'intron less/single exonic' genes from eukaryotes , 2002, Bioinform..

[2]  W. Gilbert,et al.  The exon theory of genes. , 1987, Cold Spring Harbor symposia on quantitative biology.

[3]  Pascal J. Lopez,et al.  YIDB: the Yeast Intron DataBase , 2000, Nucleic Acids Res..

[4]  Meena Kishore Sakharkar,et al.  ExInt: an Exon Intron Database , 2002, Nucleic Acids Res..

[5]  Alexei Fedorov,et al.  The signal of ancient introns is obscured by intron density and homolog number , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[6]  W. Gilbert,et al.  Intron phase correlations and the evolution of the intron/exon structure of genes. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[7]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[8]  Alexei Fedorov,et al.  Introns in gene evolution. , 2003 .

[9]  W. Gilbert,et al.  On the ancient nature of introns. , 1993, Gene.

[10]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[11]  M S Gelfand,et al.  Statistical analysis of the exon-intron structure of higher and lower eukaryote genes. , 1999, Journal of biomolecular structure & dynamics.

[12]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[13]  Alexei Fedorov,et al.  Introns in Gene Evolution , 2004, Genetica.

[14]  Tin Wee Tan,et al.  XdomView: protein domain and exon position visualization , 2003, Bioinform..

[15]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[16]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[17]  M. Zuker,et al.  Testing the exon theory of genes: the evidence from protein structure. , 1994, Science.

[18]  W. Gilbert,et al.  Large-scale comparison of intron positions in mammalian genes shows intron loss but no gain , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Iraj Daizadeh,et al.  EID: the Exon?Intron Database?an exhaustive database of protein-coding intron-containing genes , 2000, Nucleic Acids Res..

[20]  Kevin Burrage,et al.  ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome , 2000, Nature Genetics.

[21]  W. Gilbert,et al.  Footprints of primordial introns on the eukaryotic genome. , 2001, Trends in genetics : TIG.

[22]  Nicholas J. Schisler,et al.  The IDB and IEDB: intron sequence and evolution databases , 2000, Nucleic Acids Res..

[23]  Arlin Stoltzfus,et al.  Methods for evaluating exon-protein correspondences , 1995, Comput. Appl. Biosci..