Hinge Atlas: relating protein sequence to sites of structural flexibility

BackgroundRelating features of protein sequences to structural hinges is important for identifying domain boundaries, understanding structure-function relationships, and designing flexibility into proteins. Efforts in this field have been hampered by the lack of a proper dataset for studying characteristics of hinges.ResultsUsing the Molecular Motions Database we have created a Hinge Atlas of manually annotated hinges and a statistical formalism for calculating the enrichment of various types of residues in these hinges.ConclusionWe found various correlations between hinges and sequence features. Some of these are expected; for instance, we found that hinges tend to occur on the surface and in coils and turns and to be enriched with small and hydrophilic residues. Others are less obvious and intuitive. In particular, we found that hinges tend to coincide with active sites, but unlike the latter they are not at all conserved in evolution. We evaluate the potential for hinge prediction based on sequence.Motions play an important role in catalysis and protein-ligand interactions. Hinge bending motions comprise the largest class of known motions. Therefore it is important to relate the hinge location to sequence features such as residue type, physicochemical class, secondary structure, solvent exposure, evolutionary conservation, and proximity to active sites. To do this, we first generated the Hinge Atlas, a set of protein motions with the hinge locations manually annotated, and then studied the coincidence of these features with the hinge location. We found that all of the features have bearing on the hinge location. Most interestingly, we found that hinges tend to occur at or near active sites and yet unlike the latter are not conserved. Less surprisingly, we found that hinge residues tend to be small, not hydrophobic or aliphatic, and occur in turns and random coils on the surface. A functional sequence based hinge predictor was made which uses some of the data generated in this study. The Hinge Atlas is made available to the community for further flexibility studies.

[1]  Mark Gerstein,et al.  Tools and databases to analyze protein flexibility; approaches to mapping implied features onto sequences. , 2003, Methods in enzymology.

[2]  Steven Hayward,et al.  Improvements in the analysis of domain motions in proteins from conformational change: DynDom version 1.50. , 2002, Journal of molecular graphics & modelling.

[3]  D. S. Fields,et al.  Quantitative specificity of the Mnt repressor. , 1997, Journal of molecular biology.

[4]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[5]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[6]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[7]  G J Barton,et al.  Continuous and discontinuous domains: An algorithm for the automatic generation of reliable protein domain definitions , 1995, Protein science : a publication of the Protein Society.

[8]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[9]  H. Wolfson,et al.  Flexible protein alignment and hinge detection , 2002, Proteins.

[10]  D. Jacobs,et al.  Protein flexibility and dynamics using constraint theory. , 2001, Journal of molecular graphics & modelling.

[11]  W. Atchley,et al.  Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. , 2000, Molecular biology and evolution.

[12]  M. Gerstein,et al.  A database of macromolecular motions. , 1998, Nucleic acids research.

[13]  Jay Painter,et al.  Electronic Reprint Biological Crystallography Optimal Description of a Protein Structure in Terms of Multiple Groups Undergoing Tls Motion Biological Crystallography Optimal Description of a Protein Structure in Terms of Multiple Groups Undergoing Tls Motion , 2005 .

[14]  M. Gerstein,et al.  The morph server: a standardized system for analyzing and visualizing macromolecular motions in a database framework. , 2000, Nucleic acids research.

[15]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[16]  Claudia Neuhauser,et al.  The Pattern of Amino Acid Replacements in α/β-Barrels , 2002 .

[17]  Mark Gerstein,et al.  The Database of Macromolecular Motions: new features added at the decade mark , 2005, Nucleic Acids Res..

[18]  B. Robson,et al.  Conformational properties of amino acid residues in globular proteins. , 1976, Journal of molecular biology.

[19]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[20]  B. Rost,et al.  Protein flexibility and rigidity predicted from sequence , 2005, Proteins.

[21]  D. Jacobs,et al.  Protein flexibility predictions using graph theory , 2001, Proteins.

[22]  Heather A Carlson,et al.  Incorporating protein flexibility in structure-based drug discovery: using HIV-1 protease as a test case. , 2004, Journal of the American Chemical Society.

[23]  A. Lesk,et al.  Structural mechanisms for domain movements in proteins. , 1994, Biochemistry.

[24]  Claudia Neuhauser,et al.  The pattern of amino acid replacements in alpha/beta-barrels. , 2002, Molecular biology and evolution.

[25]  W R Taylor,et al.  Coevolving protein residues: maximum likelihood identification and relationship to structure. , 1999, Journal of molecular biology.

[26]  C. Sander,et al.  Parser for protein folding units , 1994, Proteins.

[27]  Liisa Holm,et al.  ADDA: a domain database with global coverage of the protein universe , 2004, Nucleic Acids Res..

[28]  K Schulten,et al.  Protein domain movements: detection of rigid domains and visualization of hinges in comparisons of atomic coordinates , 1997, Proteins.

[29]  B D Sykes,et al.  NMR solution structure of calcium-saturated skeletal muscle troponin C. , 1995, Biochemistry.

[30]  Golan Yona,et al.  Automatic prediction of protein domains from sequence information using a hybrid learning system , 2004, Bioinform..

[31]  David T. Jones,et al.  Rapid protein domain assignment from amino acid sequence using predicted secondary structure , 2002, Protein science : a publication of the Protein Society.

[32]  Mark Gerstein,et al.  Studying Macromolecular Motions in a Database Framework: From Structure to Sequence , 2002 .

[33]  A. Papoulis,et al.  Normal distributions , 1963 .

[34]  S L Mowbray,et al.  Cα‐based torsion angles: A simple tool to analyze protein conformational changes , 1995, Protein science : a publication of the Protein Society.

[35]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[36]  A. Atilgan,et al.  Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. , 1997, Folding & design.

[37]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[38]  Chris Sander,et al.  Removing near-neighbour redundancy from large protein sequence collections , 1998, Bioinform..

[39]  Mark Gerstein,et al.  Normal mode analysis of macromolecular motions in a database framework: Developing mode concentration as a useful classifying statistic , 2002, Proteins.

[40]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[41]  Mark Gerstein,et al.  A resolution-sensitive procedure for comparing protein surfaces and its application to the comparison of antigen-combining sites , 1992 .

[42]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[43]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[44]  J M Thornton,et al.  Domain assignment for protein structures using a consensus approach: Characterization and analysis , 1998, Protein science : a publication of the Protein Society.

[45]  C. Hogue,et al.  Armadillo: domain boundary prediction by amino acid composition. , 2005, Journal of molecular biology.

[46]  Ruth Nussinov,et al.  Alignment of Flexible Protein Structures , 2000, ISMB.

[47]  A. Rader,et al.  Identifying protein folding cores from the evolution of flexible regions during unfolding. , 2002, Journal of molecular graphics & modelling.

[48]  Leslie A Kuhn,et al.  Protein unfolding: Rigidity lost , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[49]  N. Obuchowski,et al.  ROC curves in clinical chemistry: uses, misuses, and possible solutions. , 2004, Clinical chemistry.

[50]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[51]  M B Swindells,et al.  A procedure for detecting structural domains in proteins , 1995, Protein science : a publication of the Protein Society.

[52]  L. Holm,et al.  Exhaustive enumeration of protein domain families. , 2003, Journal of molecular biology.

[53]  Mark Gerstein,et al.  The database of macromolecular motions: a standardized system for analyzing and visualizing macromolecular motions in a database framework , 2001 .

[54]  M. Gerstein How representative are the known structures of the proteins in a complete genome? A comprehensive structural census. , 1998, Folding & design.

[55]  J. Janin,et al.  Structural domains in proteins and their role in the dynamics of protein function. , 1983, Progress in biophysics and molecular biology.

[56]  Ruth Nussinov,et al.  A method for simultaneous alignment of multiple protein structures , 2004, Proteins.

[57]  Ivet Bahar,et al.  Elastic network models for understanding biomolecular machinery: from enzymes to supramolecular assemblies , 2005, Physical biology.