Advances in structural genomics.

New computational techniques have allowed protein folds to be assigned to all or parts of between a quarter (Caenorhabditis elegans) and a half (Mycoplasma genitalium) of the individual protein sequences in different genomes. These assignments give a new perspective on domain structures, gene duplications, protein families and protein folds in genome sequences.

[1]  M. Gerstein How representative are the known structures of the proteins in a complete genome? A comprehensive structural census. , 1998, Folding & design.

[2]  Michael Y. Galperin,et al.  Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea , 1997, Molecular microbiology.

[3]  G. Heijne,et al.  Genome‐wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms , 1998, Protein science : a publication of the Protein Society.

[4]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[5]  E V Koonin,et al.  Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[6]  A. Godzik,et al.  Fold and function predictions for Mycoplasma genitalium proteins. , 1998, Folding & design.

[7]  A T Brünger,et al.  Are there dominant membrane protein families with a given number of helices? , 1997, Proteins.

[8]  C. Chothia,et al.  Gene duplications in H. influenzae , 1995, Nature.

[9]  H. Mewes,et al.  Protein structural classes in five complete genomes , 1997, Nature Structural Biology.

[10]  A. Sali,et al.  Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  R. Grandori Systematic fold recognition analysis of the sequences encoded by the genome of Mycoplasma pneumoniae. , 1998, Protein engineering.

[12]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[13]  J. Beckwith,et al.  How many membrane proteins are there? , 1998, Protein science : a publication of the Protein Society.

[14]  Alexei V. Finkelstein,et al.  A search for the most stable folds of protein chains , 1991, Nature.

[15]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[16]  C. Chothia,et al.  Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[17]  David T. Jones Do transmembrane protein superfolds exist? , 1998, FEBS letters.

[18]  S E Brenner,et al.  Distribution of protein folds in the three superkingdoms of life. , 1999, Genome research.

[19]  Michael G. Rossmann,et al.  Chemical and biological evolution of a nucleotide-binding protein , 1974, Nature.

[20]  A. Goffeau,et al.  How many yeast genes code for membrane‐spanning proteins? , 1993, Yeast.

[21]  M. Riley,et al.  Widespread protein sequence similarities: origins of Escherichia coli genes , 1995, Journal of bacteriology.

[22]  D. Fischer,et al.  Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Sarah A. Teichmann,et al.  DIVCLUS: an automatic method in the GEANFAMMER package that finds homologous domains in single- and multi-domain proteins , 1998, Bioinform..

[24]  C. Chothia,et al.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[26]  M. Levitt,et al.  A structural census of the current population of protein sequences. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[27]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[28]  C. Chothia,et al.  Protein folds in the all-beta and all-alpha classes. , 1997, Annual review of biophysics and biomolecular structure.

[29]  Chris Sander,et al.  Removing near-neighbour redundancy from large protein sequence collections , 1998, Bioinform..

[30]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[31]  J. Gibrat,et al.  GOR method for predicting protein secondary structure from amino acid sequence. , 1996, Methods in enzymology.

[32]  M. Gerstein,et al.  Comparing genomes in terms of protein structure: surveys of a finite parts list. , 1998, FEMS microbiology reviews.

[33]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[34]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[35]  P Bork,et al.  Homology-based fold predictions for Mycoplasma genitalium proteins. , 1998, Journal of molecular biology.

[36]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[37]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[38]  Dmitrij Frishman,et al.  PEDANTic genome analysis , 1997 .

[39]  S. Kim,et al.  Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Leszek Rychlewski,et al.  Fold prediction by a hierarchy of sequence, threading, and modeling methods , 1998, Protein science : a publication of the Protein Society.

[41]  M Gerstein,et al.  A structural census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. , 1997, Journal of molecular biology.

[42]  M. Gerstein Patterns of protein‐fold usage in eight microbial genomes: A comprehensive structural census , 1998, Proteins.

[43]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[44]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[45]  J. Newman,et al.  Class‐directed structure determination: Foundation for a protein structure initiative , 1998, Protein science : a publication of the Protein Society.

[46]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[47]  Thomas Lengauer,et al.  Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology , 1998, ISMB 1999.

[48]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[49]  B. Rost,et al.  Transmembrane helices predicted at 95% accuracy , 1995, Protein science : a publication of the Protein Society.

[50]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[51]  M. Riley,et al.  Protein evolution viewed through Escherichia coli protein sequences: introducing the notion of a structural segment of homology, the module. , 1997, Journal of molecular biology.

[52]  L Shapiro,et al.  The Argonne Structural Genomics Workshop: Lamaze class for the birth of a new science. , 1998, Structure.

[53]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[54]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[55]  E. Koonin,et al.  A minimal gene set for cellular life derived by comparison of complete bacterial genomes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[56]  M. Sippl,et al.  Detection of native‐like models for amino acid sequences of unknown three‐dimensional structure in a data base of known protein conformations , 1992, Proteins.

[57]  Steven E. Brenner,et al.  The PRESAGE database for structural genomics , 1999, Nucleic Acids Res..