Protein Structure Classification

The molecular basis of life rests on the activity of large biological macromolecules, including nucleic acids (DNA and RNA), carbohydrates, lipids, and proteins. Although each plays an essential role in life, there is something special about proteins, as they are the lead performers of cellular functions. As a response, structural molecular biology has emerged as a new line of experimental research focused on revealing the structure of these bio-molecules. This branch of biology has recently experienced a major uplift through the development of high-throughput structural studies aimed at developing a comprehensive view of the protein structure universe. Although these studies are generating a wealth of information that are stored into protein structure databases, the key to their success lies in our ability to organize and analyze the information contained in those databases, and to integrate that information with other efforts aimed at solving the mysteries behind cell functions. In this survey, the first step behind any such organization scheme, namely the classification of protein structures, is described. The properties of protein structures, with special attention to their geometry, are reviewed. Computer methods for the automatic comparison and classification of these structures are then reviewed along with the existing classifications of protein structures and their applications in biology, with a special focus on computational biology. The chapter concludes the review with a discussion of the future of these classifications.

[1]  Patrice Koehl,et al.  ASTRAL compendium enhancements , 2002, Nucleic Acids Res..

[2]  J. Szustakowski,et al.  Protein structure alignment using a genetic algorithm , 2000, Proteins.

[3]  M B Swindells,et al.  A procedure for the automatic determination of hydrophobic cores in protein structures , 1995, Protein science : a publication of the Protein Society.

[4]  Mattias Ohlsson,et al.  Matching protein structures with fuzzy alignments , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  R. Jernigan,et al.  Understanding the recognition of protein structural classes by amino acid composition , 1997, Proteins.

[6]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. , 2000, Journal of molecular biology.

[7]  Frances M. G. Pearl,et al.  Quantifying the similarities within fold space. , 2002, Journal of molecular biology.

[8]  Liisa Holm,et al.  Identification of homology in protein structure classification , 2001, Nature Structural Biology.

[9]  G. Arteca,et al.  Overcrossing spectra of protein backbones: Characterization of three‐dimensional molecular shape and global structural homologies , 1993, Biopolymers.

[10]  J. Jung,et al.  Protein structure alignment using environmental profiles. , 2000, Protein engineering.

[11]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[12]  M G Rossmann,et al.  Comparison of protein structures. , 1985, Methods in enzymology.

[13]  Nathan Linial,et al.  Approximate protein structural alignment in polynomial time. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  W. L. Koltun,et al.  Precision space‐filling atomic models , 1965, Biopolymers.

[15]  Boris Steipe,et al.  Metric properties of the root-mean-square deviation of vector sets , 1997 .

[16]  P. Kraulis A program to produce both detailed and schematic plots of protein structures , 1991 .

[17]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[18]  M B Swindells,et al.  A procedure for detecting structural domains in proteins , 1995, Protein science : a publication of the Protein Society.

[19]  A C May,et al.  Toward more meaningful hierarchical classification of protein three‐dimensional structures , 1999, Proteins.

[20]  Amos Maritan,et al.  Colloquium: Geometrical approach to protein folding: a tube picture , 2003 .

[21]  Y D Cai,et al.  Using neural networks for prediction of domain structural classes. , 2000, Biochimica et biophysica acta.

[22]  J M Thornton,et al.  Structural similarity between the pleckstrin homology domain and verotoxin: The problem of measuring and evaluating structural similarity , 1995, Protein science : a publication of the Protein Society.

[23]  Arteca Scaling behavior of some molecular shape descriptors of polymer chains and protein backbones. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[24]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[25]  Gordon M. Crippen,et al.  Distance Geometry and Molecular Conformation , 1988 .

[26]  G. Barton,et al.  Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels , 1992, Proteins.

[27]  Anton J. Enright,et al.  Classification schemes for protein structure and function , 2003, Nature Reviews Genetics.

[28]  Michael Levitt,et al.  A brighter future for protein structure prediction , 1999, Nature Structural Biology.

[29]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.

[30]  J. Maddocks,et al.  Global curvature, thickness, and the ideal shapes of knots. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[31]  P. Willett,et al.  A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. , 1994, Journal of molecular biology.

[32]  Eleanor J. Gardiner,et al.  Clique-detection algorithms for matching three-dimensional molecular structures. , 1997, Journal of molecular graphics & modelling.

[33]  Stella Veretnik,et al.  Toward consistent assignment of structural domains in proteins. , 2004, Journal of molecular biology.

[34]  Haim J. Wolfson,et al.  Geometric hashing: an overview , 1997 .

[35]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[36]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[37]  C. Chothia,et al.  Determination of protein function, evolution and interactions by structural genomics. , 2001, Current opinion in structural biology.

[38]  Anders Liljas,et al.  Recognition of structural domains in globular proteins , 1974 .

[39]  Chris Sander,et al.  Globin fold in a bacterial toxin , 1993, Nature.

[40]  J M Thornton,et al.  Analysis of domain structural class using an automated class assignment protocol. , 1996, Journal of molecular biology.

[41]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[42]  M. Levitt,et al.  Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core , 1993, Current Biology.

[43]  G. Petsko,et al.  Structure of chicken muscle triose phosphate isomerase determined crystallographically at 2.5Å resolution: using amino acid sequence data , 1975, Nature.

[44]  M. Levitt Protein folding by restrained energy minimization and molecular dynamics. , 1983, Journal of molecular biology.

[45]  C Chothia,et al.  Domains in proteins: definitions, location, and structural principles. , 1985, Methods in enzymology.

[46]  Richard A. Volz,et al.  Estimating 3-D location parameters using dual number quaternions , 1991, CVGIP Image Underst..

[47]  R. Wierenga,et al.  The TIM‐barrel fold: a versatile framework for efficient enzymes , 2001, FEBS letters.

[48]  J M Thornton,et al.  Domain assignment for protein structures using a consensus approach: Characterization and analysis , 1998, Protein science : a publication of the Protein Society.

[49]  Dong Xu,et al.  Improving the performance of DomainParser for structural domain partition using neural network. , 2003, Nucleic acids research.

[50]  Robert B. Fisher,et al.  Estimating 3-D rigid body transformations: a comparison of four major algorithms , 1997, Machine Vision and Applications.

[51]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[52]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[53]  C. Sander,et al.  Dali: a network tool for protein structure comparison. , 1995, Trends in biochemical sciences.

[54]  J. Gerlt,et al.  Evolution of function in (beta/alpha)8-barrel enzymes. , 2003, Current opinion in chemical biology.

[55]  P Argos,et al.  Exploring structural homology of proteins. , 1976, Journal of molecular biology.

[56]  C. Sander,et al.  A database of protein structure families with common folding motifs , 1992, Protein science : a publication of the Protein Society.

[57]  H. Wolfson,et al.  An efficient automated computer vision based technique for detection of three dimensional structural motifs in proteins. , 1992, Journal of biomolecular structure & dynamics.

[58]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[59]  H. Bohr,et al.  A new family of global protein shape descriptors. , 2003, Mathematical biosciences.

[60]  K. Nishikawa,et al.  Protein structure comparison using the Markov transition model of evolution , 2000, Proteins.

[61]  Kathleen A. Hoffman,et al.  Link, twist, energy, and the stability of DNA minicircles. , 2003, Biopolymers.

[62]  G. Rose,et al.  Hierarchic organization of domains in globular proteins. , 1979, Journal of molecular biology.

[63]  Linus Pauling,et al.  Molecular Models of Amino Acids, Peptides, and Proteins , 1953 .

[64]  Gustavo A. Arteca,et al.  Characterization of Fold Diversity among Proteins with the Same Number of Amino Acid Residues , 1999, J. Chem. Inf. Comput. Sci..

[65]  A. Godzik The structural alignment between two proteins: Is there a unique answer? , 1996, Protein science : a publication of the Protein Society.

[66]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using orthonormal matrices , 1988 .

[67]  Ying Xu,et al.  Protein domain decomposition using a graph-theoretic approach , 2000, Bioinform..

[68]  M. H. Zehfus,et al.  Binary discontinuous compact protein domains. , 1994, Protein engineering.

[69]  Georg E. Schulz,et al.  Principles of Protein Structure , 1979 .

[70]  M J Sippl,et al.  Optimum superimposition of protein structures: ambiguities and implications. , 1996, Folding & design.

[71]  G J Barton,et al.  Continuous and discontinuous domains: An algorithm for the automatic generation of reliable protein domain definitions , 1995, Protein science : a publication of the Protein Society.

[72]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[73]  J. Janin,et al.  Location of structural domains in protein. , 1981, Biochemistry.

[74]  W. Pearson,et al.  Sensitivity and selectivity in protein structure comparison , 2004, Protein science : a publication of the Protein Society.

[75]  N. P. Brown,et al.  Protein structure: geometry, topology and classification , 2001 .

[76]  Ryan Day,et al.  A consensus view of fold space: Combining SCOP, CATH, and the Dali Domain Dictionary , 2003, Protein science : a publication of the Protein Society.

[77]  A. Mclachlan Gene duplications in the structural evolution of chymotrypsin. , 1979, Journal of molecular biology.

[78]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[79]  M. Karplus,et al.  Proteins: A Theoretical Perspective of Dynamics, Structure, and Thermodynamics , 1988 .

[80]  D Fischer,et al.  Spatial, sequence-order-independent structural comparison of alpha/beta proteins: evolutionary implications. , 1993, Journal of biomolecular structure & dynamics.

[81]  J. Richardson beta-Sheet topology and the relatedness of proteins. , 1977, Nature.

[82]  K. Chou,et al.  Prediction of Protein Structural Classes by Modified Mahalanobis Discriminant Algorithm , 1998, Journal of protein chemistry.

[83]  C. Chothia,et al.  Orthogonal packing of beta-pleated sheets in proteins. , 1982, Biochemistry.

[84]  M. L. Jones,et al.  PDBsum: a Web-based database of summaries and analyses of all PDB structures. , 1997, Trends in biochemical sciences.

[85]  Walter Gilbert,et al.  Towards a paradigm shift in biology , 1991, Nature.

[86]  Gerard J Kleywegt,et al.  Evaluation of protein fold comparison servers , 2003, Proteins.

[87]  Zhi-Ping Feng,et al.  Prediction of protein structural class by amino acid and polypeptide composition. , 2002, European journal of biochemistry.

[88]  Trevor J. Hastie,et al.  Regression analysis of multiple protein structures , 1998, RECOMB '98.

[89]  K Nishikawa,et al.  Comparison of homologous tertiary structures of proteins. , 1974, Journal of theoretical biology.

[90]  I. Rayment,et al.  Understanding the importance of protein structure to nature's routes for divergent evolution in TIM barrel enzymes. , 2004, Accounts of chemical research.

[91]  Sorin Istrail,et al.  Mathematical Methods for Protein Structure Analysis and Design , 2003, Lecture Notes in Computer Science.

[92]  O. Tapia,et al.  Proteins in vacuo. A connection between mean overcrossing number and orientationally-averaged collision cross section , 2002 .

[93]  Dror Bar-Natan,et al.  On the Vassiliev knot invariants , 1995 .

[94]  C. Sander,et al.  Detection of common three‐dimensional substructures in proteins , 1991, Proteins.

[95]  Jake K. Aggarwal,et al.  Estimation of motion from a pair of range images: A review , 1991, CVGIP Image Underst..

[96]  M Go,et al.  Modular structural units, exons, and function in chicken lysozyme. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[97]  Antonio Trovato,et al.  Optimal shapes of compact strings , 2000, Nature.

[98]  C. Sander,et al.  The FSSP database of structurally aligned protein fold families. , 1994, Nucleic acids research.

[99]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[100]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[101]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[102]  A. Maritan,et al.  Anisotropic effective interactions in a coarse‐grained tube picture of proteins , 2002, Proteins.

[103]  María Elena Ochagavía,et al.  Progressive combinatorial algorithm for multiple structural alignments: Application to distantly related proteins , 2004, Proteins.

[104]  G M Crippen,et al.  The tree structural organization of proteins. , 1978, Journal of molecular biology.

[105]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[106]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[107]  K. Chou,et al.  Prediction and classification of domain structural classes , 1998, Proteins.

[108]  Alexander A. Rashin,et al.  Location of domains in globular proteins , 1981, Nature.

[109]  Burkhard Rost,et al.  Did evolution leap to create the protein universe? , 2002, Current opinion in structural biology.

[110]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[111]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[112]  S. Teichmann,et al.  Domain combinations in archaeal, eubacterial and eukaryotic proteomes. , 2001, Journal of molecular biology.

[113]  T J Hubbard RMS/Coverage graphs: A qualitative method for comparing three‐dimensional protein structure predictions , 1999, Proteins.

[114]  J. Richardson,et al.  β-Sheet topology and the relatedness of proteins , 1977, Nature.

[115]  N. Go,et al.  Common spatial arrangements of backbone fragments in homologous and non-homologous proteins. , 1992, Journal of molecular biology.

[116]  R. Caprioli,et al.  Peptides and proteins , 2001, Nature.

[117]  C. Chothia,et al.  Relative orientation of close-packed beta-pleated sheets in proteins. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[118]  A. Efimov,et al.  Structure of alpha-alpha-hairpins with short connections. , 1991, Protein engineering.

[119]  W. Taylor Protein structure comparison using iterated double dynamic programming , 2008, Protein science : a publication of the Protein Society.

[120]  F. Anan,et al.  The Conformation of Biological Macromolecules , 1980 .

[121]  Antonio Trovato,et al.  Geometry and physics of proteins , 2002, Proteins.

[122]  Shoshana J. Wodak,et al.  Location of structural domains in proteins , 1981 .

[123]  Chuanbo Chen,et al.  A strict solution for the optimal superimposition of protein structures. , 2004, Acta crystallographica. Section A, Foundations of crystallography.

[124]  K. Dill,et al.  Using quaternions to calculate RMSD , 2004, J. Comput. Chem..

[125]  I. Kuntz,et al.  Calculation of protein tertiary structure. , 1976, Journal of molecular biology.

[126]  M. Perutz,et al.  Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-A. resolution, obtained by X-ray analysis. , 1960, Nature.

[127]  A M Lesk,et al.  Extraction of geometrically similar substructures: Least‐squares and Chebyshev fitting and the difference distance matrix , 1998, Proteins.

[128]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[129]  Peter Willett,et al.  The use of graph theoretical methods for the comparison of the structures of biological macromolecules , 1995 .

[130]  Luonan Chen,et al.  Protein structure alignment by deterministic annealing , 2005, Bioinform..

[131]  Flavio Seno,et al.  Geometry and symmetry presculpt the free-energy landscape of proteins. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[132]  A. Efimov Standard structures in proteins. , 1993, Progress in biophysics and molecular biology.

[133]  G. Rose,et al.  Compact units in proteins. , 1986, Biochemistry.

[134]  Roman A. Laskowski,et al.  PDBsum: summaries and analyses of PDB structures , 2001, Nucleic Acids Res..

[135]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[136]  W R Taylor,et al.  Analysis of the tertiary structure of protein beta-sheet sandwiches. , 1981, Journal of Molecular Biology.

[137]  James H. White Self-Linking and the Gauss Integral in Higher Dimensions , 1969 .

[138]  E G Hutchinson,et al.  The Greek key motif: extraction, classification and analysis. , 1993, Protein engineering.

[139]  Nikos Kyrpides,et al.  Genomes OnLine Database (GOLD): a monitor of genome projects world-wide , 2001, Nucleic Acids Res..

[140]  A. Flammini,et al.  Tubes near the edge of compactness and folded protein structures *Tubes near the edge of compactness , 2003 .