A “FRankenstein's monster” approach to comparative modeling: Merging the finest fragments of Fold‐Recognition models and iterative model refinement aided by 3D structure evaluation

We applied a new multi‐step protocol to predict the structures of all targets during CASP5, regardless of their potential category. 1) We used diverse fold‐recognition (FR) methods to generate initial target‐template alignments, which were converted into preliminary full‐atom models by comparative modeling. All preliminary models were evaluated (scored) by VERIFY3D to identify well‐ and poorly‐folded fragments. 2) Preliminary models with similar 3D folds were superimposed, poorly‐scoring regions were deleted and the “average model” structure was created by merging the remaining segments. All template structures reported by FR were superimposed and a composite multiple‐structure template was created from the most conserved fragments. 3). The average model was superimposed onto the composite template and the structure‐based target‐template alignment was inferred. This alignment was used to build a new (intermediate) comparative model of the target, again scored with VERIFY3D. 4) For all poorly scoring regions series of alternative alignments were generated by progressively shifting the “unfit” sequence fragment in either direction. Here, we considered additional information, such as secondary structure, placement of insertions and deletions in loops, conservation of putative catalytic residues, and the necessity to obtain a compact, well‐folded structure. For all alternative alignments, new models were built and evaluated. 5) All models were superimposed and the “FRankenstein's monster” (FR, fold recognition) model was built from best‐scoring segments. The final model was obtained after limited energy minimization to remove steric clashes between sidechains from different fragments. The novelty of this approach is in the focus on “vertical” recombination of structure fragments, typical for the ab initio field, rather than “horizontal” sequence alignment typical for comparative modeling. We tested the usefulness of the “FRankenstein” approach for non‐expert predictors: only the leader of our team had considerable experience in protein modeling ‐ he registered as a separate group (020) and submitted models built only by himself. At the onset of CASP5, the other five members of the team (students) had very little or no experience with modeling. They followed the same protocol in a deliberately naïve way. In the fourth step they used solely the VERIFY3D criterion to compare their models and the leader's model (the latter regarded only as one of the many alternatives) and generated the hybrid or selected only one model for submission (group 517). In order to compare our protocol with the traditional “one target‐one template‐one alignment” approach, we submitted (as a separate group 242) models selected from those automatically generated by all CAFASP servers (i.e. obtained without any human intervention). Here, we compare the results obtained by the three “groups”, describe successes and failures of the “FRankenstein” approach and discuss future developments of comparative modeling. The automatic version of our multi‐step protocol is being developed as a meta‐server; the prototype is freely available at http://genesilico.pl/meta/. Proteins 2003;53:369–379. © 2003 Wiley‐Liss, Inc.

[1]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[2]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[3]  M Ouali,et al.  Cascaded multiple classifiers for secondary structure prediction , 2000, Protein science : a publication of the Protein Society.

[4]  Richard Bonneau,et al.  Rosetta in CASP4: Progress in ab initio protein structure prediction , 2001, Proteins.

[5]  M J Sippl,et al.  Assessment of the CASP4 fold recognition category , 2001, Proteins.

[6]  Roland L. Dunbrack,et al.  CAFASP2: The second critical assessment of fully automated structure prediction methods , 2001, Proteins.

[7]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[8]  Edgar Wingender,et al.  ISB: Just another journal? , 1998, Silico Biol..

[9]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[10]  K Karplus,et al.  What is the value added by human intervention in protein structure prediction? , 2001, Proteins.

[11]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[12]  E V Koonin,et al.  DNA polymerase beta-like nucleotidyltransferase superfamily: identification of three new families, classification and evolutionary history. , 1999, Nucleic acids research.

[13]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[14]  Dominique Douguet,et al.  Easier threading through web-based comparisons and cross-validations , 2001, Bioinform..

[15]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[16]  Janusz M. Bujnicki,et al.  GeneSilico protein structure prediction meta-server , 2003, Nucleic Acids Res..

[17]  D Fischer,et al.  Hybrid fold recognition: combining sequence derived properties with evolutionary information. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[18]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[19]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[20]  Jinbo Xu Protein Structure Prediction by Linear Programming , 2003 .

[21]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[22]  Janusz M. Bujnicki,et al.  Comparison of protein structures reveals monophyletic origin of AdoMet-dependent methyltransferase family and mechanistic convergence rather than recent differentiation of N4-cytosine and N6-adenine DNA methylation , 1999, Silico Biol..

[23]  N. Guex,et al.  SWISS‐MODEL and the Swiss‐Pdb Viewer: An environment for comparative protein modeling , 1997, Electrophoresis.

[24]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[25]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[26]  J. Bujnicki,et al.  In silico analysis of the tRNA:m1A58 methyltransferase family: homology‐based fold prediction and identification of new members from Eubacteria and Archaea , 2001, FEBS letters.

[27]  R A Sayle,et al.  RASMOL: biomolecular graphics for all. , 1995, Trends in biochemical sciences.

[28]  Arne Elofsson,et al.  Structure prediction meta server , 2001, Bioinform..

[29]  C Venclovas,et al.  Comparison of performance in successive CASP experiments , 2001, Proteins.

[30]  R Leplae,et al.  Analysis and assessment of comparative modeling predictions in CASP4 , 2001, Proteins.

[31]  D Fischer,et al.  LiveBench‐2: Large‐scale automated evaluation of protein structure prediction servers , 2001, Proteins.

[32]  Roland L. Dunbrack,et al.  Comparative modeling of CASP3 targets using PSI‐BLAST and SCWRL , 1999, Proteins.

[33]  Andrew E. Torda,et al.  The GROMOS biomolecular simulation program package , 1999 .

[34]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.