An alternative view of protein fold space

Comparing and subsequently classifying protein structures information has received significant attention concurrent with the increase in the number of experimentally derived 3‐dimensional structures. Classification schemes have focused on biological function found within protein domains and on structure classification based on topology. Here an alternative view is presented that groups substructures. Substructures are long (50–150 residue) highly repetitive near‐contiguous pieces of polypeptide chain that occur frequently in a set of proteins from the PDB defined as structurally non‐redundant over the complete polypeptide chain. The substructure classification is based on a previously reported Combinatorial Extension (CE) algorithm that provides a significantly different set of structure alignments than those previously described, having, for example, only a 40% overlap with FSSP. Qualitatively the algorithm provides longer contiguous aligned segments at the price of a slightly higher root‐mean‐square deviation (rmsd). Clustering these alignments gives a discreet and highly repetitive set of substructures not detectable by sequence similarity alone. In some cases different substructures represent all or different parts of well known folds indicative of the Russian doll effect—the continuity of protein fold space. In other cases they fall into different structure and functional classifications. It is too early to determine whether these newly classified substructures represent new insights into the evolution of a structural framework important to many proteins. What is apparent from on‐going work is that these substructures have the potential to be useful probes in finding remote sequence homology and in structure prediction studies. The characteristics of the complete all‐by‐all comparison of the polypeptide chains present in the PDB and details of the filtering procedure by pair‐wise structure alignment that led to the emergent substructure gallery are discussed. Substructure classification, alignments, and tools to analyze them are available at http://cl.sdsc.edu/ce.html. Proteins 2000;38:247–260. © 2000 Wiley‐Liss, Inc.

[1]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[2]  T L Blundell,et al.  A database of globular protein structural domains: clustering of representative family members into similar folds. , 1996, Folding & design.

[3]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[4]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[5]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[6]  C. Sander,et al.  A database of protein structure families with common folding motifs , 1992, Protein science : a publication of the Protein Society.

[7]  A. Godzik The structural alignment between two proteins: Is there a unique answer? , 1996, Protein science : a publication of the Protein Society.

[8]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[9]  H M Holden,et al.  Three-dimensional structure of the biotin carboxylase subunit of acetyl-CoA carboxylase. , 1994, Biochemistry.

[10]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[11]  Z. X. Wang,et al.  A re-estimation for the total numbers of protein folds and superfamilies. , 1998, Protein engineering.

[12]  M. Sternberg,et al.  Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. , 1997, Journal of molecular biology.

[13]  T. P. Flores,et al.  Multiple protein structure alignment , 1994, Protein science : a publication of the Protein Society.

[14]  S J Wodak,et al.  Identification of structural domains in proteins by a graph heuristic , 1999, Proteins.

[15]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[16]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[17]  T. Südhof,et al.  Common EF‐hand motifs in cholinesterases and neuroligins suggest a role for Ca2+ binding in cell surface associations , 2008, Protein science : a publication of the Protein Society.

[18]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[19]  C Sander,et al.  Dictionary of recurrent domains in protein structures , 1998, Proteins.

[20]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[21]  J M Thornton,et al.  Domain assignment for protein structures using a consensus approach: Characterization and analysis , 1998, Protein science : a publication of the Protein Society.

[22]  J M Thornton,et al.  An atlas of protein topology cartoons available on the World-Wide Web. , 1998, Trends in biochemical sciences.

[23]  Philip E. Bourne,et al.  Protein data representation and query using optimized data decomposition , 1997, Comput. Appl. Biosci..

[24]  D. Wetlaufer Nucleation, rapid folding, and globular intrachain regions in proteins. , 1973, Proceedings of the National Academy of Sciences of the United States of America.

[25]  M J Sternberg,et al.  Identification and analysis of domains in proteins. , 1995, Protein engineering.

[26]  T. P. Flores,et al.  Identification and classification of protein fold families. , 1993, Protein engineering.

[27]  W R Taylor,et al.  Protein structural domain identification. , 1999, Protein engineering.

[28]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[29]  G J Barton,et al.  Continuous and discontinuous domains: An algorithm for the automatic generation of reliable protein domain definitions , 1995, Protein science : a publication of the Protein Society.

[30]  C. Sander,et al.  Searching protein structure databases has come of age , 1994, Proteins.

[31]  S. Hubbard Crystal structure of the activated insulin receptor tyrosine kinase in complex with peptide substrate and ATP analog , 1997, The EMBO journal.

[32]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[33]  A. Poupon,et al.  The immunoglobulin fold family: sequence analysis and 3D structure comparisons. , 1999, Protein engineering.

[34]  David T. Jones,et al.  Protein superfamilles and domain superfolds , 1994, Nature.