Protein flexibility and rigidity predicted from sequence

Structural flexibility has been associated with various biological processes such as molecular recognition and catalytic activity. In silico studies of protein flexibility have attempted to characterize and predict flexible regions based on simple principles. B‐values derived from experimental data are widely used to measure residue flexibility. Here, we present the most comprehensive large‐scale analysis of B‐values. We used this analysis to develop a neural network–based method that predicts flexible–rigid residues from amino acid sequence. The system uses both global and local information (i.e., features from the entire protein such as secondary structure composition, protein length, and fraction of surface residues, and features from a local window of sequence‐consecutive residues). The most important local feature was the evolutionary exchange profile reflecting sequence conservation in a family of related proteins. To illustrate its potential, we applied our method to 4 different case studies, each of which related our predictions to aspects of function. The first 2 were the prediction of regions that undergo conformational switches upon environmental changes (switch II region in Ras) and the prediction of surface regions, the rigidity of which is crucial for their function (tunnel in propeller folds). Both were correctly captured by our method. The third study established that residues in active sites of enzymes are predicted by our method to have unexpectedly low B‐values. The final study demonstrated how well our predictions correlated with NMR order parameters to reflect motion. Our method had not been set up to address any of the tasks in those 4 case studies. Therefore, we expect that this method will assist in many attempts at inferring aspects of function. Proteins 2005. © 2005 Wiley‐Liss, Inc.

[1]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[2]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[3]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[4]  Arthur J. Olson,et al.  The reactivity of anti-peptide antibodies is a function of the atomic mobility of sites in a protein , 1984, Nature.

[5]  W A Hendrickson,et al.  Influence of solvent accessibility and intermolecular contacts on atomic mobilities in hemerythrins. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[6]  T. Creighton Proteins: Structures and Molecular Properties , 1986 .

[7]  M J Sternberg,et al.  Analysis and prediction of the location of catalytic residues in enzymes. , 1988, Protein engineering.

[8]  Steven C. Almo,et al.  Time-resolved X-ray crystallographic study of the conformational change in Ha-Ras p21 protein on GTP hydrolysis , 1990, Nature.

[9]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[10]  K Morikawa,et al.  Structural details of ribonuclease H from Escherichia coli as refined to an atomic resolution. , 1992, Journal of molecular biology.

[11]  J. Eccleston,et al.  Mechanism of GTP hydrolysis by p21N-ras catalyzed by GAP: studies with a fluorescent GTP analogue. , 1993, Biochemistry.

[12]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[13]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[14]  M. Vihinen,et al.  Accuracy of protein flexibility predictions , 1994, Proteins.

[15]  J. Drenth Principles of protein x-ray crystallography , 1994 .

[16]  Y. Engelborghs,et al.  Molecular dynamics simulation of the solution structures of Ha-ras-p21 GDP and GTP complexes: flexibility, possible hinges, and levers of the conformational transition. , 1995, Biochemistry.

[17]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[18]  D. Tronrud,et al.  Knowledge-Based B-Factor Restraints for the Refinement of Proteins , 1996 .

[19]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[20]  P Argos,et al.  Correlation between side chain mobility and conformation in protein structures. , 1997, Protein engineering.

[21]  M Karplus,et al.  Ligand-induced conformational changes in ras p21: a normal mode and energy minimization analysis. , 1997, Journal of molecular biology.

[22]  S R Sprang,et al.  G proteins, effectors and GAPs: structure and mechanism. , 1997, Current opinion in structural biology.

[23]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[24]  E D Laue,et al.  Regional polysterism in the GTP-bound form of the human c-Ha-Ras protein. , 1997, Biochemistry.

[25]  H. Erickson,et al.  Backbone dynamics of homologous fibronectin type III cell adhesion domains from fibronectin and tenascin. , 1997, Structure.

[26]  Timothy A. Springer,et al.  Folding of the N-terminal, ligand-binding region of integrin α-subunits into a β-propeller domain , 1997 .

[27]  H. Erickson,et al.  Pervasive conformational fluctuations on microsecond time scales in a fibronectin type III domain , 1998, Nature Structural Biology.

[28]  A K Dunker,et al.  Thousands of proteins likely to have long disordered regions. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[29]  P Argos,et al.  Accessibility to internal cavities and ligand binding sites monitored by protein crystallographic thermal factors , 1998, Proteins.

[30]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[31]  I. Bahar,et al.  Structure‐based analysis of protein dynamics: Comparison of theoretical results for hen lysozyme with X‐ray diffraction and NMR relaxation data , 1999, Proteins.

[32]  David T. Jones,et al.  β Propellers: structural rigidity and functional diversity , 1999 .

[33]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[34]  D. T. Jones,et al.  Beta propellers: structural rigidity and functional diversity. , 1999, Current opinion in structural biology.

[35]  C. Kroenke,et al.  Variability of the 15N Chemical Shift Anisotropy in Escherichia coli Ribonuclease H in Solution , 1999 .

[36]  P. Romero,et al.  Folding minimal sequences: the lower bound for sequence complexity of globular proteins , 1999, FEBS letters.

[37]  R. Huber,et al.  Tachylectin‐2: crystal structure of a specific GlcNAc/GalNAc‐binding lectin involved in the innate immunity host defense of the Japanese horseshoe crab Tachypleus tridentatus , 1999, The EMBO journal.

[38]  J M Berger,et al.  Divalent metal cofactor binding in the kinetic folding trajectory of Escherichia coli ribonuclease HI , 2000, Protein science : a publication of the Protein Society.

[39]  D. Singel,et al.  Structural changes induced in p21Ras upon GAP-334 complexation as probed by ESEEM spectroscopy and molecular-dynamics simulation. , 2000, Structure.

[40]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[41]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[42]  V. Uversky,et al.  Why are “natively unfolded” proteins unstructured under physiologic conditions? , 2000, Proteins.

[43]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[44]  A. Demchenko,et al.  Recognition between flexible protein molecules: induced and assisted folding † , 2001, Journal of molecular recognition : JMR.

[45]  Zoran Obradovic,et al.  The protein trinity—linking function and disorder , 2001, Nature Biotechnology.

[46]  Marc A. Martí-Renom,et al.  EVA: continuous automatic evaluation of protein structure prediction servers , 2001, Bioinform..

[47]  K. Namba Roles of partly unfolded conformations in macromolecular self‐assembly , 2001, Genes to cells : devoted to molecular & cellular mechanisms.

[48]  Arthur M. Lesk,et al.  Introduction to protein architecture : the structural biologyof proteins , 2001 .

[49]  C D Kroenke,et al.  Nuclear magnetic resonance methods for quantifying microsecond-to-millisecond motions in biological macromolecules. , 2001, Methods in enzymology.

[50]  I. Vetter,et al.  The Guanine Nucleotide-Binding Switch in Three Dimensions , 2001, Science.

[51]  A. Palmer,et al.  Nmr probes of molecular dynamics: overview and comparison with other techniques. , 2001, Annual review of biophysics and biomolecular structure.

[52]  C. Orengo,et al.  Plasticity of enzyme active sites. , 2002, Trends in biochemical sciences.

[53]  H. Dyson,et al.  Coupling of folding and binding for unstructured proteins. , 2002, Current opinion in structural biology.

[54]  David Blow,et al.  Outline of Crystallography for Biologists , 2002 .

[55]  V. Uversky Natively unfolded proteins: A point where biology waits for physics , 2002, Protein science : a publication of the Protein Society.

[56]  Janet M Thornton,et al.  Sequence and structural differences between enzyme and nonenzyme homologs. , 2002, Structure.

[57]  B. Rost,et al.  Alignments grow, secondary structure prediction improves , 2002, Proteins.

[58]  P. Tompa Intrinsically unstructured proteins. , 2002, Trends in biochemical sciences.

[59]  Vladimir N Uversky,et al.  What does it mean to be natively unfolded? , 2002, European journal of biochemistry.

[60]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[61]  A Keith Dunker,et al.  Intrinsic disorder and protein function. , 2002, Biochemistry.

[62]  L. Iakoucheva,et al.  Intrinsic disorder in cell-signaling and cancer-associated proteins. , 2002, Journal of molecular biology.

[63]  B. Rost,et al.  Loopy proteins appear conserved in evolution. , 2002, Journal of molecular biology.

[64]  Klaus Schulten,et al.  Mechanical force generation by G proteins , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[65]  B. Halle,et al.  Flexibility and packing in proteins , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[66]  C. A. Andersen,et al.  Continuum secondary structure captures protein flexibility. , 2002, Structure.

[67]  Z. Obradovic,et al.  Identification and functions of usefully disordered proteins. , 2002, Advances in protein chemistry.

[68]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.

[69]  M. Kosloff,et al.  GTPase catalysis by Ras and other G-proteins: insights from Substrate Directed SuperImposition. , 2003, Journal of molecular biology.

[70]  S. Teague Implications of protein flexibility for drug discovery , 2003, Nature Reviews Drug Discovery.

[71]  V. Uversky,et al.  Protein folding revisited. A polypeptide chain at the folding – misfolding – nonfolding cross-roads: which way to go? , 2003, Cellular and Molecular Life Sciences CMLS.

[72]  Zheng Yuan,et al.  Flexibility analysis of enzyme active sites by crystallographic temperature factors. , 2003, Protein engineering.

[73]  David T. Jones,et al.  Prediction of disordered regions in proteins from position specific score matrices , 2003, Proteins.

[74]  B. Rost,et al.  Predicted protein–protein interaction sites from local sequence information , 2003, FEBS letters.

[75]  Burkhard Rost,et al.  NORSp: predictions of long regions without regular secondary structure , 2003, Nucleic Acids Res..

[76]  Burkhard Rost,et al.  UniqueProt: creating representative protein sequence sets , 2003, Nucleic Acids Res..

[77]  T. Gibson,et al.  Protein disorder prediction: implications for structural proteomics. , 2003, Structure.

[78]  Jeremy C. Smith,et al.  The role of dynamics in enzyme activity. , 2003, Annual review of biophysics and biomolecular structure.

[79]  B. Rost,et al.  Better prediction of sub‐cellular localization by combining evolutionary and structural information , 2003, Proteins.

[80]  P. Radivojac,et al.  Improved amino acid flexibility parameters , 2003, Protein science : a publication of the Protein Society.

[81]  Marc A. Martí-Renom,et al.  EVA: evaluation of protein structure prediction servers , 2003, Nucleic Acids Res..

[82]  Zoran Obradovic,et al.  Predicting intrinsic disorder from amino acid sequence , 2003, Proteins.

[83]  John F Hunt,et al.  Dynamics of ATP-binding cassette contribute to allosteric control, nucleotide binding and energy transduction in ABC transporters. , 2004, Journal of molecular biology.

[84]  P. Radivojac,et al.  Protein flexibility and intrinsic disorder , 2004, Protein science : a publication of the Protein Society.

[85]  A. Palmer,et al.  NMR characterization of the dynamics of biomacromolecules. , 2004, Chemical reviews.

[86]  H. Dyson,et al.  Unfolded proteins and protein folding studied by NMR. , 2004, Chemical reviews.

[87]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[88]  Zheng Yuan,et al.  Prediction of protein B‐factor profiles , 2005, Proteins.

[89]  B. Rost How to Use Protein 1- D Structure Predicted by PROFphd , 2005 .

[90]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[91]  P. Karplus,et al.  Prediction of chain flexibility in proteins , 1985, Naturwissenschaften.