Comparison of common homology modeling algorithms: application of user-defined alignments.

The number of known three-dimensional protein sequences is orders of magnitude higher than the number of known protein structures. This is a result of an increase in large-scale genomic sequencing projects, the inability of proteins to crystallize or crystals to diffract well, or a simple lack of resources. An alternative is to use one of a variety of available homology modeling programs to produce a computational model of a protein. Protein models are produced using information from known protein structures found to be similar. Here, we compare the ability of a number of popular homology modeling programs to produce quality models from user-defined target-template sequence alignments over a range of circumstances including low sequence identity, variable sequence length, and when interfaced with a protein or small molecule. Programs evaluated include Prime, SWISS-MODEL, MOE, MODELLER, ROSETTA, Composer, ORCHESTRAR, and I-TASSER. Proteins to be modeled were chosen to test a range of sequence identities, sequence lengths, and protein motifs and all are of scientific importance. These include HIV-1 protease, kinases, dihydrofolate reductase, a viral capsid protein, and factor Xa among others. For the most part, the programs produce results that are similar. For example, all programs are able to produce reasonable models when sequence identities are >30% and all programs have difficulties producing complete models when sequence identities are lower. However, certain programs fare slightly better than others in certain situations and we attempt to provide insight on this topic.

[1]  M. Levitt Accurate modeling of protein conformation by automatic segment matching. , 1992, Journal of molecular biology.

[2]  Ronald M. Levy,et al.  The SGB/NP hydration free energy model based on the surface generalized born solvent reaction field and novel nonpolar hydration free energy estimators , 2002, J. Comput. Chem..

[3]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[4]  T L Blundell,et al.  A variable gap penalty function and feature weights for protein 3-D structure comparisons. , 1992, Protein engineering.

[5]  R. Friesner,et al.  Novel procedure for modeling ligand/receptor induced fit effects. , 2006, Journal of medicinal chemistry.

[6]  Akbar Nayeem,et al.  A comparative study of available software for high‐accuracy homology modeling: From sequence alignments to structural models , 2006, Protein science : a publication of the Protein Society.

[7]  T. Klabunde,et al.  Structure-based drug discovery using GPCR homology modeling: successful virtual screening for antagonists of the alpha1A adrenergic receptor. , 2005, Journal of medicinal chemistry.

[8]  T. Blundell,et al.  Knowledge based modelling of homologous proteins, Part I: Three-dimensional frameworks derived from the simultaneous superposition of multiple structures. , 1987, Protein engineering.

[9]  M. Burghammer,et al.  Crystal structure of the human β2 adrenergic G-protein-coupled receptor , 2007, Nature.

[10]  R. Friesner,et al.  Evaluation and Reparametrization of the OPLS-AA Force Field for Proteins via Comparison with Accurate Quantum Chemical Calculations on Peptides† , 2001 .

[11]  M. Sippl,et al.  Detection of native‐like models for amino acid sequences of unknown three‐dimensional structure in a data base of known protein conformations , 1992, Proteins.

[12]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[13]  G. Klebe,et al.  Successful virtual screening for a submicromolar antagonist of the neurokinin-1 receptor based on a ligand-supported homology model. , 2004, Journal of medicinal chemistry.

[14]  B. Honig,et al.  A hierarchical approach to all‐atom protein loop prediction , 2004, Proteins.

[15]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[16]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[17]  Anna Tramontano,et al.  The assessment of methods for protein structure prediction. , 2008, Methods in molecular biology.

[18]  C. Deane,et al.  CODA: A combined algorithm for predicting the structurally variable regions of protein models , 2001, Protein science : a publication of the Protein Society.

[19]  David E. Kim,et al.  Physically realistic homology models built with ROSETTA can be more accurate than their templates. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Matthias Keil,et al.  Comparison of composer and ORCHESTRAR , 2008, Proteins.

[21]  T L Blundell,et al.  Knowledge based modelling of homologous proteins, Part II: Rules for the conformations of substituted sidechains. , 1987, Protein engineering.

[22]  D. Schomburg,et al.  Prediction of protein three-dimensional structures in insertion and deletion regions: a procedure for searching data bases of representative protein fragments using geometric scoring criteria. , 1995, Journal of molecular biology.

[23]  Arne Elofsson,et al.  All are not equal: A benchmark of different homology modeling programs , 2005, Protein science : a publication of the Protein Society.

[24]  R. Stevens,et al.  GPCR Engineering Yields High-Resolution Structural Insights into β2-Adrenergic Receptor Function , 2007, Science.

[25]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[26]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[27]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[28]  Tom L. Blundell,et al.  CHORAL: a differential geometry approach to the prediction of the cores of protein structures , 2005, Bioinform..

[29]  Jiye Shi,et al.  HOMSTRAD: adding sequence information to structure-based alignments of homologous protein families , 2001, Bioinform..

[30]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[31]  A. Sali,et al.  A composite score for predicting errors in protein structure models , 2006, Protein science : a publication of the Protein Society.

[32]  R. Stevens,et al.  High-Resolution Crystal Structure of an Engineered Human β2-Adrenergic G Protein–Coupled Receptor , 2007, Science.

[33]  Ruben Abagyan,et al.  ICM—A new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation , 1994, J. Comput. Chem..

[34]  Manuel C. Peitsch,et al.  SWISS-MODEL: an automated protein homology-modeling server , 2003, Nucleic Acids Res..

[35]  W. L. Jorgensen,et al.  Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids , 1996 .

[36]  David F. Burke,et al.  Andante: reducing side-chain rotamer search space during comparative modeling using environment-specific substitution probabilities , 2007, Bioinform..

[37]  A. Sali,et al.  Statistical potential for assessment and prediction of protein structures , 2006, Protein science : a publication of the Protein Society.

[38]  M C Peitsch,et al.  ProMod and Swiss-Model: Internet-based tools for automated comparative protein modelling. , 1996, Biochemical Society transactions.