Comparative protein structure modeling by iterative alignment, model building and model assessment.

Comparative or homology protein structure modeling is severely limited by errors in the alignment of a modeled sequence with related proteins of known three-dimensional structure. To ameliorate this problem, we have developed an automated method that optimizes both the alignment and the model implied by it. This task is achieved by a genetic algorithm protocol that starts with a set of initial alignments and then iterates through re-alignment, model building and model assessment to optimize a model assessment score. During this iterative process: (i) new alignments are constructed by application of a number of operators, such as alignment mutations and cross-overs; (ii) comparative models corresponding to these alignments are built by satisfaction of spatial restraints, as implemented in our program MODELLER; (iii) the models are assessed by a variety of criteria, partly depending on an atomic statistical potential. When testing the procedure on a very difficult set of 19 modeling targets sharing only 4-27% sequence identity with their template structures, the average final alignment accuracy increased from 37 to 45% relative to the initial alignment (the alignment accuracy was measured as the percentage of positions in the tested alignment that were identical to the reference structure-based alignment). Correspondingly, the average model accuracy increased from 43 to 54% (the model accuracy was measured as the percentage of the C(alpha) atoms of the model that were within 5 A of the corresponding C(alpha) atoms in the superposed native structure). The present method also compares favorably with two of the most successful previously described methods, PSI-BLAST and SAM. The accuracy of the final models would be increased further if a better method for ranking of the models were available.

[1]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[2]  Kimmen Sjölander,et al.  Simultaneous Sequence Alignment and Tree Construction Using Hidden Markov Models , 2002, Pacific Symposium on Biocomputing.

[3]  E. Winzeler,et al.  Treasures and traps in genome-wide data sets: case examples from yeast , 2002, Nature Reviews Genetics.

[4]  A. Lemmon,et al.  The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Arne Elofsson,et al.  A study on protein sequence alignment quality , 2002, Proteins.

[6]  M. Levitt,et al.  Improved recognition of native-like protein structures using a family of designed sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Marc A. Martí-Renom,et al.  EVA: continuous automatic evaluation of protein structure prediction servers , 2001, Bioinform..

[8]  Narayanan Eswar,et al.  Structure of the 80S Ribosome from Saccharomyces cerevisiae—tRNA-Ribosome and Subunit-Subunit Interactions , 2001, Cell.

[9]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[10]  Irwin D. Kuntz,et al.  A genetic algorithm for structure-based de novo design , 2001, J. Comput. Aided Mol. Des..

[11]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[12]  Eleanor J. Gardiner,et al.  Protein docking using a genetic algorithm , 2001, Proteins.

[13]  G. Harauz,et al.  Three-dimensional structure of rat surfactant protein A trimers in association with phospholipid monolayers. , 2000, Biochemistry.

[14]  J. Szustakowski,et al.  Protein structure alignment using a genetic algorithm , 2000, Proteins.

[15]  B. Chait,et al.  Immunoglobulin motif DNA recognition and heterodimerization of the PEBP2/CBF Runt domain , 2000, Nature Structural Biology.

[16]  Alejandro A. Schäffer,et al.  IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices , 1999, Bioinform..

[17]  A. Sali,et al.  Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[18]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[19]  C. Cantor,et al.  Massive attack on high-throughput biology , 1998, Nature Genetics.

[20]  B. Rost,et al.  Marrying structure and genomics. , 1998, Structure.

[21]  C Carter,et al.  Escherichia coli cytidine deaminase provides a molecular model for ApoB RNA editing and a mechanism for RNA substrate recognition. , 1998, Journal of molecular biology.

[22]  Andrew K. C. Wong,et al.  A genetic algorithm for multiple molecular sequence alignment , 1997, Comput. Appl. Biosci..

[23]  Andrej Sali,et al.  Crystal Structure of the δ′ Subunit of the Clamp-Loader Complex of E. coli DNA Polymerase III , 1997, Cell.

[24]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[25]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[26]  J Moult,et al.  Protein folding simulations with genetic algorithms and a detailed molecular description. , 1997, Journal of molecular biology.

[27]  R Sánchez,et al.  Advances in comparative protein-structure modelling. , 1997, Current opinion in structural biology.

[28]  J Moult,et al.  The current state of the art in protein structure prediction. , 1996, Current opinion in biotechnology.

[29]  A Godzik,et al.  Structural diversity in a family of homologous proteins. , 1996, Journal of molecular biology.

[30]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[31]  Richard L. Stevens,et al.  Packaging of Proteases and Proteoglycans in the Granules of Mast Cells and Other Hematopoietic Cells , 1995, The Journal of Biological Chemistry.

[32]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[33]  M. Sippl Recognition of errors in three‐dimensional structures of proteins , 1993, Proteins.

[34]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[35]  M. Sternberg,et al.  Towards an automatic method of predicting protein structure by homology: an evaluation of suboptimal sequence alignments. , 1992, Protein engineering.

[36]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[37]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[38]  Narayanan Eswar,et al.  MODBASE, a database of annotated comparative protein structure models , 2002, Nucleic Acids Res..

[39]  M. Forster,et al.  Molecular modelling in structural biology. , 2002, Micron.

[40]  A. Fiser,et al.  Comparative protein structure modeling , 2002 .

[41]  R. Sánchez,et al.  Access the most recent version at doi: 10.1110/ps.25502 References , 2001 .

[42]  C Venclovas,et al.  Comparison of performance in successive CASP experiments , 2001, Proteins.

[43]  H. Margalit,et al.  Evaluation of PSI‐BLAST alignment accuracy in comparison to structural alignments , 2000, Protein science : a publication of the Protein Society.

[44]  R Sánchez,et al.  Comparative protein structure modeling. Introduction and practical examples with modeller. , 2000, Methods in molecular biology.

[45]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[46]  D Xu,et al.  Protein Structure Prediction in a 210-Type Lattice Model: Parameter Optimization in the Genetic Algorithm Using Orthogonal Array , 1999, Journal of protein chemistry.

[47]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[48]  A. Sali,et al.  Crystal structure of the delta' subunit of the clamp-loader complex of E. coli DNA polymerase III. , 1997, Cell.

[49]  A Elofsson,et al.  Assessing the performance of fold recognition methods by means of a comprehensive benchmark. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[50]  Michael S. Waterman,et al.  Introduction to computational biology , 1995 .

[51]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .