GTOP: a database of protein structures predicted from genome sequences

Large-scale genome projects generate an unprecedented number of protein sequences, most of them are experimentally uncharacterized. Predicting the 3D structures of sequences provides important clues as to their functions. We constructed the Genomes TO Protein structures and functions (GTOP) database, containing protein fold predictions of a huge number of sequences. Predictions are mainly carried out with the homology search program PSI-BLAST, currently the most popular among high-sensitivity profile search methods. GTOP also includes the results of other analyses, e.g. homology and motif search, detection of transmembrane helices and repetitive sequences. We have completed analyzing the sequences of 41 organisms, with the number of proteins exceeding 120 000 in total. GTOP uses a graphical viewer to present the analytical results of each ORF in one page in a 'color-bar' format. The assigned 3D structures are presented by Chime plug-in or RasMol. The binding sites of ligands are also included, providing functional information. The GTOP server is available at http://spock.genes.nig.ac.jp/~genome/gtop.html.

[1]  M Gerstein,et al.  Advances in structural genomics. , 1999, Current opinion in structural biology.

[2]  B. Berger,et al.  MultiCoil: A program for predicting two‐and three‐stranded coiled coils , 1997, Protein science : a publication of the Protein Society.

[3]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[4]  K Nishikawa,et al.  Structural/functional assignment of unknown bacteriophage T4 proteins by iterative database searches. , 2000, Gene.

[5]  S E Brenner,et al.  Distribution of protein folds in the three superkingdoms of life. , 1999, Genome research.

[6]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[7]  Miguel A. Andrade-Navarro,et al.  Automated genome sequence analysis and annotation , 1999, Bioinform..

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[10]  Takakazu Kaneko,et al.  Extension of CyanoBase. CyanoMutants: repository of mutant information on Synechocystis sp. strain PCC6803 , 1999, Nucleic Acids Res..

[11]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[12]  Osamu Ohara,et al.  HUGE: a database for human large proteins identified by Kazusa cDNA sequencing project , 1999, Nucleic Acids Res..

[13]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Narayanan Eswar,et al.  MODBASE, a database of annotated comparative protein structure models , 2002, Nucleic Acids Res..

[15]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[16]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[17]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[18]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[19]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Shigeki Mitaku,et al.  SOSUI: classification and secondary structure prediction system for membrane proteins , 1998, Bioinform..

[21]  R. Durbin,et al.  Analysis of protein domain families in Caenorhabditis elegans. , 1997, Genomics.

[22]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[23]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[24]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[25]  Dmitrij Frishman,et al.  Functional and structural genomics using PEDANT , 2001, Bioinform..

[26]  Stephen K. Burley,et al.  An overview of structural genomics , 2000, Nature Structural Biology.