Identification of structural domains in proteins by a graph heuristic

A novel automatic procedure for identifying domains from protein atomic coordinates is presented. The procedure, termed STRUDL (STRUctural Domain Limits), does not take into account information on secondary structures and handles any number of domains made up of contiguous or non‐contiguous chain segments. The core algorithm uses the Kernighan‐Lin graph heuristic to partition the protein into residue sets which display minimum interactions between them. These interactions are deduced from the weighted Voronoi diagram. The generated partitions are accepted or rejected on the basis of optimized criteria, representing basic expected physical properties of structural domains. The graph heuristic approach is shown to be very effective, it approximates closely the exact solution provided by a branch and bound algorithm for a number of test proteins. In addition, the overall performance of STRUDL is assessed on a set of 787 representative proteins from the Protein Data Bank by comparison to domain definitions in the CATH protein classification. The domains assigned by STRUDL agree with the CATH assignments in at least 81% of the tested proteins. This result is comparable to that obtained previously using PUU (Holm and Sander, Proteins 1994;9:256–268), the only other available algorithm designed to identify domains with any number of non‐contiguous chain segments. A detailed discussion of the structures for which our assignments differ from those in CATH brings to light some clear inconsistencies between the concept of structural domains based on minimizing inter‐domain interactions and that of delimiting structural motifs that represent acceptable folding topologies or architectures. Considering both concepts as complementary and combining them in a layered approach might be the way forward. Proteins 1999;35:338–352. © 1999 Wiley‐Liss, Inc.

[1]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[2]  M B Swindells,et al.  A procedure for detecting structural domains in proteins , 1995, Protein science : a publication of the Protein Society.

[3]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[4]  G J Barton,et al.  Continuous and discontinuous domains: An algorithm for the automatic generation of reliable protein domain definitions , 1995, Protein science : a publication of the Protein Society.

[5]  P. Kraulis A program to produce both detailed and schematic plots of protein structures , 1991 .

[6]  D. Wetlaufer Nucleation, rapid folding, and globular intrachain regions in proteins. , 1973, Proceedings of the National Academy of Sciences of the United States of America.

[7]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[8]  M J Sternberg,et al.  Identification and analysis of domains in proteins. , 1995, Protein engineering.

[9]  F. M. Richards,et al.  Calculation of molecular volumes and areas for structures of known geometry. , 1985, Methods in enzymology.

[10]  J L Finney,et al.  Calculation of protein volumes: an alternative to the Voronoi procedure. , 1982, Journal of molecular biology.

[11]  D Eisenberg,et al.  Oligomer formation by 3D domain swapping: a model for protein assembly and misassembly. , 1997, Advances in protein chemistry.

[12]  C Chothia,et al.  Domains in proteins: definitions, location, and structural principles. , 1985, Methods in enzymology.

[13]  G. Rose,et al.  Hierarchic organization of domains in globular proteins. , 1979, Journal of molecular biology.

[14]  S. Wodak,et al.  Deviations from standard atomic volumes as a quality measure for protein crystal structures. , 1996, Journal of molecular biology.

[15]  M. Levitt,et al.  The volume of atoms on the protein surface: calculated from simulation, using Voronoi polyhedra. , 1995, Journal of molecular biology.

[16]  C. Sander,et al.  Parser for protein folding units , 1994, Proteins.

[17]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[18]  William J. Cook,et al.  Combinatorial optimization , 1997 .

[19]  Anders Liljas,et al.  Recognition of structural domains in globular proteins , 1974 .

[20]  M G Rossmann,et al.  Letter: Molecular symmetry axes and subunit interfaces in certain dehydrogenases. , 1973, Journal of molecular biology.

[21]  M. Go Correlation of DNA exonic regions with protein structural units in haemoglobin , 1981, Nature.

[22]  Herbert Edelsbrunner,et al.  Algorithms in Combinatorial Geometry , 1987, EATCS Monographs in Theoretical Computer Science.

[23]  J M Thornton,et al.  Domain assignment for protein structures using a consensus approach: Characterization and analysis , 1998, Protein science : a publication of the Protein Society.

[24]  Christian Sander Physical criteria for folding units of globular proteins , 1981 .

[25]  M Gerstein,et al.  Volume changes on protein folding. , 1994, Structure.

[26]  M G Rossmann,et al.  Comparison of super-secondary structures in proteins. , 1973, Journal of molecular biology.

[27]  G. Rose,et al.  Compact units in proteins. , 1986, Biochemistry.

[28]  N. Colloc'h,et al.  Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment. , 1993, Protein engineering.

[29]  T L Blundell,et al.  An automatic method involving cluster analysis of secondary structures for the identification of domains in proteins , 1995, Protein science : a publication of the Protein Society.

[30]  F. Richards The interpretation of protein structures: total volume, group volume distributions and packing density. , 1974, Journal of molecular biology.

[31]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[32]  Alexander A. Rashin,et al.  Location of domains in globular proteins , 1981, Nature.

[33]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[34]  Laurence A. Wolsey,et al.  Integer and Combinatorial Optimization , 1988, Wiley interscience series in discrete mathematics and optimization.

[35]  Shoshana J. Wodak,et al.  Location of structural domains in proteins , 1981 .

[36]  H. Scheraga,et al.  Prediction of the location of structural domains in globular proteins , 1988, Journal of protein chemistry.