Ab initio protein structure prediction on a genomic scale: Application to the Mycoplasma genitalium genome

An ab initio protein structure prediction procedure, TOUCHSTONE, was applied to all 85 small proteins of the Mycoplasma genitalium genome. TOUCHSTONE is based on a Monte Carlo refinement of a lattice model of proteins, which uses threading-based tertiary restraints. Such restraints are derived by extracting consensus contacts and local secondary structure from at least weakly scoring structures that, in some cases, can lack any global similarity to the sequence of interest. Selection of the native fold was done by using the convergence of the simulation from two different conformational search schemes and the lowest energy structure by a knowledge-based atomic-detailed potential. Among the 85 proteins, for 34 proteins with significant threading hits, the template structures were reasonably well reproduced. Of the remaining 51 proteins, 29 proteins converged to five or fewer clusters. In the test set, 84.8% of the proteins that converged to five or fewer clusters had a correct fold among the clusters. If this statistic is simply applied, 24 proteins (84.8% of the 29 proteins) may have correct folds. Thus, the topology of a total of 58 proteins probably has been correctly predicted. Based on these results, ab initio protein structure prediction is becoming a practical approach.

[1]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[2]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[3]  J. Skolnick,et al.  TOUCHSTONE: An ab initio protein structure prediction method that uses threading-based tertiary restraints , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  H. Umeyama,et al.  An automatic homology modeling method consisting of database searches and simulated annealing. , 2000, Journal of molecular graphics & modelling.

[5]  J. Skolnick,et al.  Finding the needle in a haystack: educing native folds from ambiguous ab initio protein structure predictions , 2001 .

[6]  J Skolnick,et al.  Defrosting the frozen approximation: PROSPECTOR— A new approach to threading , 2001, Proteins.

[7]  D. Baker,et al.  Prospects for ab initio protein structural genomics. , 2001, Journal of molecular biology.

[8]  A. Godzik,et al.  Functional insights from structural predictions: Analysis of the Escherichia coli genome , 2008, Protein science : a publication of the Protein Society.

[9]  J. Skolnick,et al.  Assembly of protein structure from sparse experimental data: An efficient Monte Carlo model , 1998, Proteins.

[10]  M Kanehisa,et al.  Prediction of membrane proteins based on classification of transmembrane segments. , 1998, Protein engineering.

[11]  D. Fischer,et al.  Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[12]  J Skolnick,et al.  Universal similarity measure for comparing protein structures. , 2001, Biopolymers.

[13]  Wang,et al.  Replica Monte Carlo simulation of spin glasses. , 1986, Physical review letters.

[14]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[15]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[16]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[17]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[18]  Stephen K. Burley,et al.  An overview of structural genomics , 2000, Nature Structural Biology.

[19]  J. Skolnick,et al.  A distance‐dependent atomic knowledge‐based potential for improved protein structure selection , 2001, Proteins.

[20]  C. Orengo,et al.  Analysis and assessment of ab initio three‐dimensional prediction, secondary structure, and contacts prediction , 1999, Proteins.

[21]  M. Gerstein Patterns of protein‐fold usage in eight microbial genomes: A comprehensive structural census , 1998, Proteins.

[22]  S E Brenner,et al.  Distribution of protein folds in the three superkingdoms of life. , 1999, Genome research.

[23]  Takeshi Kawabata,et al.  GTOP: a database of protein structures predicted from genome sequences , 2002, Nucleic Acids Res..

[24]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[25]  J Skolnick,et al.  Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity. , 1998, Journal of molecular biology.

[26]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[27]  A. Sali,et al.  Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.