A rapid method for exploring the protein structure universe

We have developed an automatic protein fingerprinting method for the evaluation of protein structural similarities based on secondary structure element compositions, spatial arrangements, lengths, and topologies. This method can rapidly identify proteins sharing structural homologies as we demonstrate with five test cases: the globins, the mammalian trypsinlike serine proteases, the immunoglobulins, the cupredoxins, and the actinlike ATPase domain‐containing proteins. Principal components analysis of the similarity distance matrix calculated from an all‐by‐all comparison of 1,031 unique chains in the Protein Data Bank has produced a distribution of structures within a high‐dimensional structural space. Fifty percent of the variance observed for this distribution is bounded by six axes, two of which encode structural variability within two large families, the immunoglobulins and the trypsinlike serine proteases. Many aspects of the spatial distribution remain stable upon reduction of the database to 140 proteins with minimal family overlap. The axes correlated with specific structural families are no longer observed. A clear hierarchy of organization is seen in the arrangement of protein structures in the universe. At the highest level, protein structures populate regions corresponding to the all‐alpha, all‐beta, and α/β superfamilies. Large protein families are arranged along family‐specific axes, forming local densely populated regions within the space. The lowest level of organization is intrafamilial; homologous structures are ordered by variations in peripheral secondary structure elements or by conformational shifts in the tertiary structure. Proteins 1999;34:317–332. © 1999 Wiley‐Liss, Inc.

[1]  N. Alexandrov,et al.  SARFing the PDB. , 1996, Protein engineering.

[2]  Stephen H. Bryant,et al.  MMDB: An ASN.1 Specification for Macromolecular Structure , 1995, ISMB.

[3]  Irwin D. Kuntz,et al.  A fast and efficient method for 2D and 3D molecular shape description , 1992, J. Comput. Aided Mol. Des..

[4]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[5]  D Fischer,et al.  Analysis of topological and nontopological structural similarities in the PDB: New examples with old structures , 1996, Proteins.

[6]  H. Wolfson,et al.  An efficient automated computer vision based technique for detection of three dimensional structural motifs in proteins. , 1992, Journal of biomolecular structure & dynamics.

[7]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[8]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[9]  K. Mizuguchi,et al.  Comparison of spatial arrangements of secondary structural elements in proteins. , 1995, Protein engineering.

[10]  P Bork,et al.  An ATPase domain common to prokaryotic cell cycle proteins, sugar kinases, actin, and hsp70 heat shock proteins. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[11]  S. Wodak,et al.  Optimal protein structure alignments by multiple linkage clustering: application to distantly related proteins. , 1995, Protein engineering.

[12]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[13]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[14]  W R Taylor,et al.  Fast structure alignment for protein databank searching , 1992, Proteins.

[15]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[16]  R. Nussinov,et al.  A 3D sequence-independent representation of the protein data bank. , 1995, Protein engineering.

[17]  Johnz Willett Similarity and Clustering in Chemical Information Systems , 1987 .

[18]  S J Remington,et al.  A systematic approach to the comparison of protein structures. , 1980, Journal of molecular biology.

[19]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[20]  P Willett,et al.  Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. , 1993, Journal of molecular biology.

[21]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[22]  C. Sander,et al.  Dali: a network tool for protein structure comparison. , 1995, Trends in biochemical sciences.

[23]  M. T. Barakat,et al.  Molecular structure matching by simulated annealing. III. The incorporation of null correspondences into the matching problem , 1991, J. Comput. Aided Mol. Des..

[24]  T L Blundell,et al.  A database of globular protein structural domains: clustering of representative family members into similar folds. , 1996, Folding & design.

[25]  D. Yee,et al.  DILL Families and the structural relatedness among globular proteins data , 1993 .

[26]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[27]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[28]  N. Go,et al.  Common spatial arrangements of backbone fragments in homologous and non-homologous proteins. , 1992, Journal of molecular biology.

[29]  S. Bryant Evaluation of threading specificity and accuracy , 1996, Proteins.

[30]  P Willett,et al.  Use of techniques derived from graph theory to compare secondary structure motifs in proteins. , 1990, Journal of molecular biology.

[31]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[32]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[33]  Tom L. Blundell,et al.  Structure-based identification and clustering of protein families and superfamilies , 1994, J. Comput. Aided Mol. Des..

[34]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[35]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[36]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[37]  N N Alexandrov,et al.  Biological meaning, statistical significance, and classification of local spatial similarities in nonhomologous proteins , 1994, Protein science : a publication of the Protein Society.

[38]  L. Rydén Evolution of blue copper proteins. , 1988, Progress in clinical and biological research.

[39]  C. Sander,et al.  Detection of common three‐dimensional substructures in proteins , 1991, Proteins.