Review: what can structural classifications reveal about protein evolution?

In this article we present a review of the methods used for comparing and classifying protein structures. We discuss the hierarchies and populations of fold groups and evolutionary families in some of the major classifications and we consider some of the problems confronting any general analyses of structural evolution in protein families. We also review some more recent analyses that have expanded these classifications by identifying sequence relatives in the genomes and thereby reveal interesting trends in fold usage and recurrence.

[1]  P Argos,et al.  A comparison of the heme binding pocket in globins and cytochrome b5. , 1975, The Journal of biological chemistry.

[2]  O. Ptitsyn,et al.  Similarities of protein topologies: evolutionary divergence, functional convergence or principles of folding? , 1980, Quarterly Reviews of Biophysics.

[3]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[4]  Graeme Wistow,et al.  The molecular structure and stability of the eye lens: X-ray analysis of γ-crystallin II , 1981, Nature.

[5]  A M Lesk,et al.  Evolution of proteins formed by beta-sheets. II. The core of the immunoglobulin domains. , 1982, Journal of molecular biology.

[6]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[7]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[8]  T L Blundell,et al.  Comparison of solvent-inaccessible cores of homologous proteins: definitions useful for protein modelling. , 1987, Protein engineering.

[9]  O. Ptitsyn,et al.  Why do globular proteins fit the limited set of folding patterns? , 1987, Progress in biophysics and molecular biology.

[10]  W. W. Jong,et al.  The enzyme lactate dehydrogenase as a structural protein in avian and crocodilian lenses , 1987, Nature.

[11]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[12]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[13]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[14]  P. Argos,et al.  Analysis of insertions/deletions in protein structures. , 1992, Journal of molecular biology.

[15]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[16]  G. Barton,et al.  Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels , 1992, Proteins.

[17]  T. P. Flores,et al.  Recurring structural motifs in proteins with different functions , 1993, Current Biology.

[18]  M. Levitt,et al.  Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core , 1993, Current Biology.

[19]  T. P. Flores,et al.  Comparison of conformational characteristics in structurally similar protein pairs , 1993, Protein science : a publication of the Protein Society.

[20]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[21]  A G Murzin,et al.  Sweet-tasting protein monellin is related to the cystatin family of thiol proteinase inhibitors. , 1993, Journal of molecular biology.

[22]  T. P. Flores,et al.  Identification and classification of protein fold families. , 1993, Protein engineering.

[23]  David T. Jones,et al.  Protein superfamilles and domain superfolds , 1994, Nature.

[24]  C. Orengo Classification of protein folds , 1994 .

[25]  C. Sander,et al.  Parser for protein folding units , 1994, Proteins.

[26]  C. Sander,et al.  Searching protein structure databases has come of age , 1994, Proteins.

[27]  T. P. Flores,et al.  Multiple protein structure alignment , 1994, Protein science : a publication of the Protein Society.

[28]  G J Barton,et al.  Continuous and discontinuous domains: An algorithm for the automatic generation of reliable protein domain definitions , 1995, Protein science : a publication of the Protein Society.

[29]  M. Riley,et al.  Widespread protein sequence similarities: origins of Escherichia coli genes , 1995, Journal of bacteriology.

[30]  M. Gerstein,et al.  Average core structures and variability measures for protein families: application to the immunoglobulins. , 1995, Journal of molecular biology.

[31]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[32]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[33]  G. Farber,et al.  The structure and evolution of a/β barrel proteins , 1995, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[34]  E V Koonin,et al.  Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[35]  C. Chothia,et al.  Gene duplications in H. influenzae , 1995, Nature.

[36]  M B Swindells,et al.  A procedure for detecting structural domains in proteins , 1995, Protein science : a publication of the Protein Society.

[37]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[38]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[39]  S H Bryant,et al.  A dynamic look at structures: WWW-Entrez and the Molecular Modeling Database. , 1996, Trends in biochemical sciences.

[40]  A C May Pairwise iterative superposition of distantly related proteins and assessment of the significance of 3-D structural similarity. , 1996, Protein engineering.

[41]  William R. Taylor,et al.  A Protein Structure Comparison Methodology , 1996, Comput. Chem..

[42]  J. Thornton,et al.  PROMOTIF—A program to identify and analyze structural motifs in proteins , 1996, Protein science : a publication of the Protein Society.

[43]  Monica Riley,et al.  Genes and proteins of Escherichia coli (GenProtEc) , 1996, Nucleic Acids Res..

[44]  M B Swindells,et al.  Detecting structural similarities: a user's guide. , 1996, Methods in enzymology.

[45]  A. Godzik The structural alignment between two proteins: Is there a unique answer? , 1996, Protein science : a publication of the Protein Society.

[46]  M. Totrov,et al.  Contact area difference (CAD): a robust measure to evaluate accuracy of protein models. , 1997, Journal of molecular biology.

[47]  D. Fischer,et al.  Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[48]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[49]  M. Sternberg,et al.  Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. , 1997, Journal of molecular biology.

[50]  M Gerstein,et al.  A structural census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. , 1997, Journal of molecular biology.

[51]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[52]  C Sander,et al.  An evolutionary treasure: unification of a broad set of amidohydrolases related to urease , 1997, Proteins.

[53]  David R. Gilbert,et al.  FlyBase: a Drosophila database. The FlyBase consortium , 1997, Nucleic Acids Res..

[54]  C. Chothia,et al.  Protein folds in the all-beta and all-alpha classes. , 1997, Annual review of biophysics and biomolecular structure.

[55]  Alexander Wlodawer,et al.  Database of three-dimensional structures of HIV proteinases , 1997, Nature Structural Biology.

[56]  C. Chothia,et al.  Intermediate sequences increase the detection of homology between sequences. , 1997, Journal of molecular biology.

[57]  Chris Sander,et al.  Decision Support System for the Evolutionary Classification of Protein Structures , 1997, ISMB.

[58]  C. Chothia,et al.  Population statistics of protein structures: lessons from structural classifications. , 1997, Current opinion in structural biology.

[59]  M. Gerstein,et al.  LPFC: An internet library of protein family core structures , 1997, Protein science : a publication of the Protein Society.

[60]  Chris Sander,et al.  The HSSP database of protein structure-sequence alignments and family profiles , 1998, Nucleic Acids Res..

[61]  E. Pennisi Taking a Structured Approach to Understanding Proteins , 1998, Science.

[62]  R. Russell,et al.  Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution. , 1998, Journal of molecular biology.

[63]  C. Chothia,et al.  Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[64]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[65]  J M Thornton,et al.  Domain assignment for protein structures using a consensus approach: Characterization and analysis , 1998, Protein science : a publication of the Protein Society.

[66]  T L Blundell,et al.  CAMPASS: a database of structurally aligned protein superfamilies. , 1998, Structure.

[67]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[68]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[69]  C. Chothia,et al.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[70]  A. Sali,et al.  Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[71]  P Bork,et al.  Homology-based fold predictions for Mycoplasma genitalium proteins. , 1998, Journal of molecular biology.

[72]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[73]  C Sander,et al.  Dictionary of recurrent domains in protein structures , 1998, Proteins.

[74]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[75]  M J Sternberg,et al.  Supersites within superfolds. Binding site similarity in the absence of homology. , 1998, Journal of molecular biology.

[76]  C. Orengo,et al.  Protein folds and functions. , 1998, Structure.

[77]  L Shapiro,et al.  The Argonne Structural Genomics Workshop: Lamaze class for the birth of a new science. , 1998, Structure.

[78]  B. Rost,et al.  Marrying structure and genomics. , 1998, Structure.

[79]  A. Murzin How far divergent evolution goes in proteins. , 1998, Current opinion in structural biology.

[80]  C A Orengo,et al.  Genome analysis: Assigning protein coding regions to three‐dimensional structures , 1999 .

[81]  Robert D. Finn,et al.  Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins , 1999, Nucleic Acids Res..

[82]  C. Orengo CORA—Topological fingerprints for protein structural families , 2008, Protein science : a publication of the Protein Society.

[83]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999 , 1999, Nucleic Acids Res..

[84]  James E. Bray,et al.  The CATH Database provides insights into protein structure/function relationships , 1999, Nucleic Acids Res..

[85]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.

[86]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[87]  C. Orengo,et al.  Correlation of observed fold frequency with the occurrence of local structural motifs. , 1999, Journal of molecular biology.

[88]  Ruben Recabarren,et al.  Estimating the total number of protein folds , 1999, Proteins.

[89]  A C May,et al.  Toward more meaningful hierarchical classification of protein three‐dimensional structures , 1999, Proteins.

[90]  C. Orengo,et al.  Evolution of protein function, from a structural perspective. , 1999, Current opinion in chemical biology.

[91]  H. Kessler,et al.  The solution structure of VAT-N reveals a ‘missing link’ in the evolution of complex enzymes from a simple βαββ element , 1999, Current Biology.

[92]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[93]  Roberto Sánchez,et al.  ModBase: A database of comparative protein structure models , 1999, Bioinform..

[94]  M. Sternberg,et al.  Benchmarking PSI-BLAST in genome annotation. , 1999, Journal of molecular biology.

[95]  M Gerstein,et al.  Advances in structural genomics. , 1999, Current opinion in structural biology.

[96]  Amos Bairoch,et al.  The ENZYME data bank in 1999 , 1999, Nucleic Acids Res..

[97]  S. Bryant,et al.  Identification of homologous core structures , 1999, Proteins.

[98]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[99]  C A Orengo,et al.  Combining sensitive database searches with multiple intermediates to detect distant homologues. , 1999, Protein engineering.

[100]  W. Pearson,et al.  Evolution of protein sequences and structures. , 1999, Journal of molecular biology.

[101]  S E Brenner,et al.  Distribution of protein folds in the three superkingdoms of life. , 1999, Genome research.

[102]  A. Valencia,et al.  Practical limits of function prediction , 2000, Proteins.

[103]  Michael J. E. Sternberg,et al.  SAWTED: Structure Assignment With Text Description-Enhanced detection of remote homologues with automated SWISS-PROT annotation comparisons , 2000, Bioinform..

[104]  W A Koppensteiner,et al.  Characterization of novel proteins based on known protein structures. , 2000, Journal of molecular biology.

[105]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[106]  Natalia Maltsev,et al.  WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction , 2000, Nucleic Acids Res..

[107]  I. Jonassen,et al.  Searching the protein structure databank with weak sequence patterns and structural constraints. , 2000, Journal of molecular biology.

[108]  Yanli Wang,et al.  MMDB: 3D structure data in Entrez , 2000, Nucleic Acids Res..

[109]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[110]  Sarah A. Teichmann,et al.  Fast assignment of protein structures to sequences using the Intermediate Sequence Library PDB-ISL , 2000, Bioinform..

[111]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. II. On the relationship between sequence and structural similarity for proteins that are not obviously related in sequence. , 2000, Journal of molecular biology.

[112]  E V Koonin,et al.  Estimating the number of protein folds and families from complete genome data. , 2000, Journal of molecular biology.

[113]  James E. Bray,et al.  Assigning genomic sequences to CATH , 2000, Nucleic Acids Res..

[114]  Michael E. Cusick,et al.  The Yeast Proteome Database (YPD) and Caenorhabditis elegans Proteome Database (WormPD): comprehensive resources for the organization and comparison of model organism protein information , 2000, Nucleic Acids Res..

[115]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[116]  W R Taylor,et al.  Protein structure comparison using SAP. , 2000, Methods in molecular biology.

[117]  M Wilmanns,et al.  Structural evidence for evolution of the beta/alpha barrel scaffold by gene duplication and fusion. , 2000, Science.

[118]  Tim J. P. Hubbard,et al.  SCOP: a Structural Classification of Proteins database , 2000, Nucleic Acids Res..

[119]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. , 2000, Journal of molecular biology.

[120]  Frances M. G. Pearl,et al.  The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues. , 2000, Protein engineering.

[121]  N. Grishin Fold change in evolution of protein structures. , 2001, Journal of structural biology.

[122]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[123]  Marek S. Skrzypek,et al.  YPDTM, PombePDTM and WormPDTM: model organism volumes of the BioKnowledgeTM Library, an integrated resource for protein information , 2001, Nucleic Acids Res..

[124]  James E. Bray,et al.  A rapid classification protocol for the CATH Domain Database to support structural genomics , 2001, Nucleic Acids Res..

[125]  Geoffrey J. Barton,et al.  3Dee: a database of protein structural domains , 2001, Bioinform..

[126]  Frances M. G. Pearl,et al.  The CATH extended protein‐family database: Providing structural annotations for genome sequences , 2002, Protein science : a publication of the Protein Society.

[127]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.