3D‐SHOTGUN: A novel, cooperative, fold‐recognition meta‐predictor

To gain a better understanding of the biological role of proteins encoded in genome sequences, knowledge of their three‐dimensional (3D) structure and function is required. The computational assignment of folds is becoming an increasingly important complement to experimental structure determination. In particular, fold‐recognition methods aim to predict approximate 3D models for proteins bearing no sequence similarity to any protein of known structure. However, fully automated structure‐prediction methods can currently produce reliable models for only a fraction of these sequences. Using a number of semiautomated procedures, human expert predictors are often able to produce more and better predictions than automated methods. We describe a novel, fully automatic, fold‐recognition meta‐predictor, named 3D‐SHOTGUN, which incorporates some of the strategies human predictors have successfully applied. This new method is reminiscent of the so‐called cooperative algorithms of Computer Vision. The input to 3D‐SHOTGUN are the top models predicted by a number of independent fold‐recognition servers. The meta‐predictor consists of three steps: (i) assembly of hybrid models, (ii) confidence assignment, and (iii) selection. We have applied 3D‐SHOTGUN to an unbiased test set of 77 newly released protein structures sharing no sequence similarity to proteins previously released. Forty‐six correct rank‐1 predictions were obtained, 30 of which had scores higher than that of the first incorrect prediction—a significant improvement over the performance of all individual servers. Furthermore, the predicted hybrid models were, on average, more similar to their corresponding native structures than those produced by the individual servers. This opens the possibility of generating more accurate, full‐atom homology models for proteins with no sequence similarity to proteins of known structure. These improvements represent a step forward toward the wider applicability of fully automated structure‐prediction methods at genome scales. Proteins 2003;51:434–441. © 2003 Wiley‐Liss, Inc.

[1]  Roland L. Dunbrack,et al.  CAFASP2: The second critical assessment of fully automated structure prediction methods , 2001, Proteins.

[2]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[3]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[4]  M. Levitt Accurate modeling of protein conformation by automatic segment matching. , 1992, Journal of molecular biology.

[5]  M J Sippl,et al.  Assessment of the CASP4 fold recognition category , 2001, Proteins.

[6]  Michael J. E. Sternberg,et al.  Recognition of remote protein homologies using three-dimensional information to generate a position specific scoring matrix in the program 3D-PSSM , 1999, RECOMB.

[7]  D Fischer,et al.  LiveBench‐2: Large‐scale automated evaluation of protein structure prediction servers , 2001, Proteins.

[8]  Arne Elofsson,et al.  Structure prediction meta server , 2001, Bioinform..

[9]  E. Huang,et al.  Ab initio fold prediction of small helical proteins using distance geometry and knowledge-based scoring functions. , 1999, Journal of molecular biology.

[10]  D. Fischer,et al.  Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[11]  D Fischer,et al.  LiveBench‐1: Continuous benchmarking of protein structure prediction servers , 2001, Protein science : a publication of the Protein Society.

[12]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[13]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[14]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[15]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[16]  Alison Abbott,et al.  Computer modellers seek out 'Ten Most Wanted' proteins , 2001, Nature.

[17]  D Fischer,et al.  The 2000 Olympic Games of protein structure prediction; fully automated programs are being evaluated vis-à-vis human teams in the protein structure prediction experiment CAFASP2. , 2000, Protein engineering.

[18]  K. Fidelis,et al.  Comparison of systematic search and database methods for constructing segments of protein structure. , 1994, Protein engineering.

[19]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[20]  Eugene Santos,et al.  Generalizing Knowledge Representation Rules for Acquiring and Validating Uncertain Knowledge , 2000, FLAIRS Conference.

[21]  David Baker,et al.  We need both computer models and experiments , 2001, Nature.

[22]  Timothy F. Havel,et al.  A new method for building protein conformations from sequence alignments with homologues of known structure. , 1991, Journal of molecular biology.

[23]  J Lundström,et al.  Pcons: A neural‐network–based consensus predictor that improves fold recognition , 2001, Protein science : a publication of the Protein Society.

[24]  D Fischer,et al.  Hybrid fold recognition: combining sequence derived properties with evolutionary information. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[25]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[26]  T. L. Blundell,et al.  Knowledge-based prediction of protein structures and the design of novel molecules , 1987, Nature.

[27]  J. Skolnick,et al.  TOUCHSTONE: An ab initio protein structure prediction method that uses threading-based tertiary restraints , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[28]  J. Greer Comparative model-building of the mammalian serine proteases. , 1981, Journal of molecular biology.