Toward consistent assignment of structural domains in proteins.

The assignment of protein domains from three-dimensional structure is critically important in understanding protein evolution and function, yet little quality assurance has been performed. Here, the differences in the assignment of structural domains are evaluated using six common assignment methods. Three human expert methods (AUTHORS (authors' annotation), CATH and SCOP) and three fully automated methods (DALI, DomainParser and PDP) are investigated by analysis of individual methods against the author's assignment as well as analysis based on the consensus among groups of methods (only expert, only automatic, combined). The results demonstrate that caution is recommended in using current domain assignments, and indicates where additional work is needed. Specifically, the major factors responsible for conflicting domain assignments between methods, both experts and automatic, are: (1) the definition of very small domains; (2) splitting secondary structures between domains; (3) the size and number of discontinuous domains; (4) closely packed or convoluted domain-domain interfaces; (5) structures with large and complex architectures; and (6) the level of significance placed upon structural, functional and evolutionary concepts in considering structural domain definitions. A web-based resource that focuses on the results of benchmarking and the analysis of domain assignments is available at

[1]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[2]  Shoshana J. Wodak,et al.  Location of structural domains in proteins , 1981 .

[3]  Ilya N. Shindyalov,et al.  PDP: protein domain parser , 2003, Bioinform..

[4]  C. Ponting,et al.  The natural history of protein domains. , 2002, Annual review of biophysics and biomolecular structure.

[5]  L. Holm,et al.  Exhaustive enumeration of protein domain families. , 2003, Journal of molecular biology.

[6]  M J Sternberg,et al.  Identification and analysis of domains in proteins. , 1995, Protein engineering.

[7]  S. Fields,et al.  Proteomics. Proteomics in genomeland. , 2001, Science.

[8]  C. Janeway,et al.  Progress in immunology. Syndromes of diminished resistance to infection. , 1968, The Journal of pediatrics.

[9]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[10]  F M Richards,et al.  Areas, volumes, packing and protein structure. , 1977, Annual review of biophysics and bioengineering.

[11]  I D Campbell,et al.  The structure and function of protein modules. , 1991, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[12]  P Argos,et al.  Exploring structural homology of proteins. , 1976, Journal of molecular biology.

[13]  W R Taylor,et al.  Protein structural domain identification. , 1999, Protein engineering.

[14]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.

[15]  Ryan Day,et al.  A consensus view of fold space: Combining SCOP, CATH, and the Dali Domain Dictionary , 2003, Protein science : a publication of the Protein Society.

[16]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[17]  E. Trifonov,et al.  Segmented structure of protein sequences and early evolution of genome by combinatorial fusion of DNA elements , 1995, Journal of Molecular Evolution.

[18]  Liisa Holm,et al.  Identification of homology in protein structure classification , 2001, Nature Structural Biology.

[19]  Janet M. Thornton,et al.  Protein domain superfolds and superfamilies , 1994 .

[20]  Tim J. P. Hubbard,et al.  SCOP database in 2002: refinements accommodate structural genomics , 2002, Nucleic Acids Res..

[21]  D. Wetlaufer Nucleation, rapid folding, and globular intrachain regions in proteins. , 1973, Proceedings of the National Academy of Sciences of the United States of America.

[22]  J. Janin,et al.  Location of structural domains in protein. , 1981, Biochemistry.

[23]  S J Wodak,et al.  Identification of structural domains in proteins by a graph heuristic , 1999, Proteins.

[24]  C. Chothia,et al.  Population statistics of protein structures: lessons from structural classifications. , 1997, Current opinion in structural biology.

[25]  C Sander,et al.  Dictionary of recurrent domains in protein structures , 1998, Proteins.

[26]  Ying Xu,et al.  Protein domain decomposition using a graph-theoretic approach , 2000, Bioinform..

[27]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[28]  G J Barton,et al.  Continuous and discontinuous domains: An algorithm for the automatic generation of reliable protein domain definitions , 1995, Protein science : a publication of the Protein Society.

[29]  M B Swindells,et al.  A procedure for the automatic determination of hydrophobic cores in protein structures , 1995, Protein science : a publication of the Protein Society.

[30]  J M Thornton,et al.  Domain assignment for protein structures using a consensus approach: Characterization and analysis , 1998, Protein science : a publication of the Protein Society.

[31]  Stephen H. Bryant,et al.  Domain size distributions can predict domain boundaries , 2000, Bioinform..

[32]  Dong Xu,et al.  Improving the performance of DomainParser for structural domain partition using neural network. , 2003, Nucleic acids research.

[33]  Jong H. Park,et al.  Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast. , 2001, Journal of molecular biology.

[34]  G. Rose,et al.  Compact units in proteins. , 1986, Biochemistry.

[35]  Anders Liljas,et al.  Recognition of structural domains in globular proteins , 1974 .

[36]  S. Fields Proteomics in Genomeland , 2001, Science.

[37]  I D Campbell,et al.  Protein modules. , 1991, Trends in biochemical sciences.

[38]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[39]  M. Rossman,et al.  Letter: Recognition of structural domains in globular proteins. , 1974, Journal of molecular biology.