Identification of domains in protein structures from the analysis of intramolecular interactions.

The subdivision of protein structures into smaller and independent structural domains has a fundamental importance in understanding protein evolution and function and in the development of protein classification methods as well as in the interpretation of experimental data. Due to the rapid growth in the number of solved protein structures, the need for devising new accurate algorithmic methods has become more and more urgent. In this paper, we propose a new computational approach that is based on the concept of domain as a compact and independent folding unit and on the analysis of the residue-residue energy interactions obtainable through classical all-atom force field calculations. In particular, starting from the analysis of the nonbonded interaction energy matrix associated with a protein, our method filters out and selects only those specific subsets of interactions that define possible independent folding nuclei within a complex protein structure. This allows grouping different protein fragments into energy clusters that are found to correspond to structural domains. The strategy has been tested using proper benchmark data sets, and the results have shown that the new approach is fast and reliable in determining the number of domains in a totally ab initio manner and without making use of any training set or knowledge of the systems in exam. Moreover, our method, identifying the most relevant residues for the stabilization of each domain, may complement the results given by other classification techniques and may provide useful information to design and guide new experiments.

[1]  Igor N Berezovsky,et al.  Discrete structure of van der Waals domains in globular proteins. , 2003, Protein engineering.

[2]  G. Tiana,et al.  Similar folds with different stabilization mechanisms: the cases of prion and doppel proteins , 2006, BMC Structural Biology.

[3]  K. Hinsen,et al.  Analysis of domain motions in large proteins , 1999, Proteins.

[4]  Giorgio Colombo,et al.  Relationship between energy distribution and fold stability: Insights from molecular dynamics simulations of native and mutant proteins , 2008, Proteins.

[5]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[6]  Yaoqi Zhou,et al.  DDOMAIN: Dividing structures into domains using a normalized domain–domain interaction profile , 2007, Protein science : a publication of the Protein Society.

[7]  C. Micheletti,et al.  Coarse-grained description of protein internal dynamics: an optimal strategy for decomposing proteins in rigid subunits. , 2009, Biophysical journal.

[8]  K. Hinsen Analysis of domain motions by approximate normal mode calculations , 1998, Proteins.

[9]  Giorgio Colombo,et al.  Computational study of the resistance shown by the subtype B/HIV-1 protease to currently known inhibitors. , 2010, Biochemistry.

[10]  Dong Xu,et al.  Improving the performance of DomainParser for structural domain partition using neural network. , 2003, Nucleic acids research.

[11]  H. Berendsen,et al.  Systematic analysis of domain motions in proteins from conformational change: New results on citrate synthase and T4 lysozyme , 1998, Proteins.

[12]  Stella Veretnik,et al.  Toward consistent assignment of structural domains in proteins. , 2004, Journal of molecular biology.

[13]  C Sander,et al.  Dictionary of recurrent domains in protein structures , 1998, Proteins.

[14]  Joseph A. Bank,et al.  Supporting Online Material Materials and Methods Figs. S1 to S10 Table S1 References Movies S1 to S3 Atomic-level Characterization of the Structural Dynamics of Proteins , 2022 .

[15]  Ying Xu,et al.  Protein domain decomposition using a graph-theoretic approach , 2000, Bioinform..

[16]  Stella Veretnik,et al.  Partitioning protein structures into domains: why is it so difficult? , 2006, Journal of molecular biology.

[17]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[18]  D. Wetlaufer Nucleation, rapid folding, and globular intrachain regions in proteins. , 1973, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Frances M. G. Pearl,et al.  The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution , 2006, Nucleic Acids Res..

[20]  C. Sander,et al.  Parser for protein folding units , 1994, Proteins.

[21]  G J Barton,et al.  Continuous and discontinuous domains: An algorithm for the automatic generation of reliable protein domain definitions , 1995, Protein science : a publication of the Protein Society.

[22]  Albert C. Pan,et al.  Pathway and mechanism of drug binding to G-protein-coupled receptors , 2011, Proceedings of the National Academy of Sciences.

[23]  A. Demchenko,et al.  Hierarchical clustering of the correlation patterns: new method of domain identification in proteins. , 2006, Biophysical chemistry.

[24]  H. Berendsen,et al.  Model‐free methods of analyzing domain motions in proteins from simulation: A comparison of normal mode analysis and molecular dynamics simulation of lysozyme , 1997, Proteins.

[25]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[26]  W R Taylor,et al.  Protein structural domain identification. , 1999, Protein engineering.

[27]  G Vriend,et al.  The essential dynamics of thermolysin: Confirmation of the hinge‐bending motion and comparison of simulations in vacuum and water , 1995, Proteins.

[28]  O. Carugo Identification of domains in protein crystal structures , 2007 .

[29]  Ricardo A Broglia,et al.  Understanding the determinants of stability and folding of small globular proteins from their energetics , 2003, Protein science : a publication of the Protein Society.

[30]  Eric T. Kim,et al.  How does a drug molecule find its target binding site? , 2011, Journal of the American Chemical Society.

[31]  Philip E. Bourne,et al.  dConsensus: a tool for displaying domain assignments by multiple structure-based algorithms and for construction of a consensus assignment , 2010, BMC Bioinformatics.

[32]  Shoshana J. Wodak,et al.  Location of structural domains in proteins , 1981 .

[33]  Ilya N. Shindyalov,et al.  PDP: protein domain parser , 2003, Bioinform..

[34]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[35]  K. N. Trueblood,et al.  On the rigid-body motion of molecules in crystals , 1968 .

[36]  H. Berendsen,et al.  Essential dynamics of proteins , 1993, Proteins.

[37]  Igor N. Berezovsky,et al.  Domain Hierarchy and closed Loops (DHcL): a server for exploring hierarchy of protein domain structure , 2008, Nucleic Acids Res..

[38]  Nick V. Grishin,et al.  A Database of Domain Definitions for Proteins with Complex Interdomain Geometry , 2009, PloS one.

[39]  Ozlem Keskin,et al.  Architectures and functional coverage of protein-protein interfaces. , 2008, Journal of molecular biology.

[40]  Giorgio Colombo,et al.  Determinants of protein stability and folding: Comparative analysis of beta‐lactoglobulins and liver basic fatty acid binding protein , 2005, Proteins.