Generalized protein structure prediction based on combination of fold‐recognition with de novo folding and evaluation of models

To predict the tertiary structure of full‐length sequences of all targets in CASP6, regardless of their potential category (from easy comparative modeling to fold recognition to apparent new folds) we used a novel combination of two very different approaches developed independently in our laboratories, which ranked quite well in different categories in CASP5. First, the GeneSilico metaserver was used to identify domains, predict secondary structure, and generate fold recognition (FR) alignments, which were converted to full‐atom models using the “FRankenstein's Monster” approach for comparative modeling (CM) by recombination of protein fragments. Additional models generated “de novo” by fully automated servers were obtained from the CASP website. All these models were evaluated by VERIFY3D, and residues with scores better than 0.2 were used as a source of spatial restraints. Second, a new implementation of the lattice‐based protein modeling tool CABS was used to carry out folding guided by the above‐mentioned restraints with the Replica Exchange Monte Carlo sampling technique. Decoys generated in the course of simulation were subject to the average linkage hierarchical clustering. For a representative decoy from each cluster, a full‐atom model was rebuilt. Finally, five models were selected for submission based on combination of various criteria, including the size, density, and average energy of the corresponding cluster, and the visual evaluation of the full‐atom structures and their relationship to the original templates. The combination of FRankenstein and CABS was one of the best‐performing algorithms over all categories in CASP6 (it is important to note that our human intervention was very limited, and all steps in our method can be easily automated). We were able to generate a number of very good models, especially in the Comparative Modeling and New Folds categories. Frequently, the best models were closer to the native structure than any of the templates used. The main problem we encountered was in the ranking of the final models (the only step of significant human intervention), due to the insufficient computational power, which precluded the possibility of full‐atom refinement and energy‐based evaluation. Proteins 2005;Suppl 7:84–90. © 2005 Wiley‐Liss, Inc.

[1]  D. Baker,et al.  Coupled prediction of protein secondary and tertiary structure , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Janusz M Bujnicki,et al.  Crystallographic and bioinformatic studies on restriction endonucleases: inference of evolutionary relationships in the "midnight zone" of homology. , 2003, Current protein & peptide science.

[3]  M Feig,et al.  Accurate reconstruction of all‐atom protein representations from side‐chain‐based low‐resolution models , 2000, Proteins.

[4]  Janusz M. Bujnicki,et al.  COLORADO3D, a web server for the visual analysis of protein structures , 2004, Nucleic Acids Res..

[5]  Paul W. Fitzjohn,et al.  In silico protein recombination: enhancing template and sequence alignment selection for comparative protein modelling. , 2003, Journal of molecular biology.

[6]  M Ouali,et al.  Cascaded multiple classifiers for secondary structure prediction , 2000, Protein science : a publication of the Protein Society.

[7]  Lisa N Kinch,et al.  CASP5 assessment of fold recognition target predictions , 2003, Proteins.

[8]  Janusz M. Bujnicki,et al.  Phylogeny of the Restriction Endonuclease-Like Superfamily Inferred from Comparison of Protein Structures , 2000, Journal of Molecular Evolution.

[9]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[10]  Marcin Feder,et al.  Identification of a new family of putative PD-(D/E)XK nucleases with unusual phylogenomic distribution and a new type of the active site , 2005, BMC Genomics.

[11]  D Fischer,et al.  LiveBench‐1: Continuous benchmarking of protein structure prediction servers , 2001, Protein science : a publication of the Protein Society.

[12]  Jonathan Casper,et al.  Combining local‐structure, fold‐recognition, and new fold methods for protein structure prediction , 2003, Proteins.

[13]  Shoji Takada,et al.  A Reversible Fragment Assembly Method for De Novo Protein Structure Prediction , 2003 .

[14]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[15]  Anna Tramontano,et al.  Assessment of homology‐based predictions in CASP5 , 2003, Proteins.

[16]  Zhongwen Xie,et al.  Translesion synthesis of acetylaminofluorene-dG adducts by DNA polymerase zeta is stimulated by yeast Rev1 protein. , 2004, Nucleic acids research.

[17]  Janusz M. Bujnicki,et al.  GeneSilico protein structure prediction meta-server , 2003, Nucleic Acids Res..

[18]  A. Sali,et al.  Modeller: generation and refinement of homology-based protein structure models. , 2003, Methods in enzymology.

[19]  Ying Xu,et al.  PROSPECT-PSPP: an automatic computational pipeline for protein structure prediction , 2004, Nucleic Acids Res..

[20]  Hongyi Zhou,et al.  Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition , 2004, Proteins.

[21]  Roland L. Dunbrack,et al.  CAFASP2: The second critical assessment of fully automated structure prediction methods , 2001, Proteins.

[22]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[23]  Andrzej Kloczkowski,et al.  Inferring ideal amino acid interaction forms from statistical protein contact potentials , 2005, Proteins.

[24]  Aleksey A. Porollo,et al.  Accurate prediction of solvent accessibility using neural networks–based regression , 2004, Proteins.

[25]  Jens Meiler,et al.  Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation , 2003, Proteins.

[26]  Dominik Gront,et al.  HCPM - program for hierarchical clustering of protein models , 2005, Bioinform..

[27]  J Lundström,et al.  Pcons: A neural‐network–based consensus predictor that improves fold recognition , 2001, Protein science : a publication of the Protein Society.

[28]  A. Kolinski Protein modeling and structure prediction with a reduced representation. , 2004, Acta biochimica Polonica.

[29]  E. Westhof,et al.  Nucleic Acids and Molecular Biology , 1988, Nucleic Acids and Molecular Biology.

[30]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[31]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[32]  Roland L. Dunbrack,et al.  CAFASP3: The third critical assessment of fully automated structure prediction methods , 2003, Proteins.

[33]  Daniel Fischer,et al.  3D‐SHOTGUN: A novel, cooperative, fold‐recognition meta‐predictor , 2003, Proteins.

[34]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[35]  Marcin Feder,et al.  A “FRankenstein's monster” approach to comparative modeling: Merging the finest fragments of Fold‐Recognition models and iterative model refinement aided by 3D structure evaluation , 2003, Proteins.

[36]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[37]  Andrzej Kolinski,et al.  Protein fragment reconstruction using various modeling techniques , 2003, J. Comput. Aided Mol. Des..

[38]  A. Sali,et al.  Comparative protein structure modeling by iterative alignment, model building and model assessment. , 2003, Nucleic acids research.

[39]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[40]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[41]  Daisuke Kihara,et al.  TOUCHSTONE: A unified approach to protein structure prediction , 2003, Proteins.

[42]  Tin Wee Tan,et al.  ANTIMIC: a database of antimicrobial sequences , 2004, Nucleic Acids Res..