A galaxy of folds

Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co‐occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.

[1]  Adam Godzik,et al.  Connecting the protein structure universe by using sparse recurring fragments. , 2005, Structure.

[2]  Function driven protein evolution. A possible proto-protein for the RNA-binding proteins. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[3]  Nick V Grishin,et al.  Discrete-continuous duality of protein structure space. , 2009, Current opinion in structural biology.

[4]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[5]  Andrei N. Lupas,et al.  Evolution of Protein Folds , 2008 .

[6]  Samuel Karlin,et al.  Protein length in eukaryotic and prokaryotic proteomes , 2005, Nucleic acids research.

[7]  Manuel C. Peitsch,et al.  Computational structural biology : methods and applications , 2008 .

[8]  C. Ponting,et al.  On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? , 2001, Journal of structural biology.

[9]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[10]  Johannes Söding,et al.  On the origin of the histone fold , 2007, BMC Structural Biology.

[11]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[12]  R. Brennan The winged-helix DNA-binding motif: Another helix-turn-helix takeoff , 1993, Cell.

[13]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[14]  Andrei N. Lupas,et al.  Gene Duplication of the Eight-stranded β-Barrel OmpX Produces a Functional Pore: A Scenario for the Evolution of Transmembrane β-Barrels , 2007 .

[15]  J. Söding,et al.  More than the sum of their parts: On the evolution of proteins from peptides , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[16]  Andrei N. Lupas,et al.  CLANS: a Java application for visualizing protein families based on pairwise similarity , 2004, Bioinform..

[17]  P. Bork,et al.  Homology among (betaalpha)(8) barrels: implications for the evolution of metabolic pathways. , 2000, Journal of molecular biology.

[18]  Ian Sillitoe,et al.  The CATH Hierarchy Revisited—Structural Divergence in Domain Superfamilies and the Continuity of Fold Space , 2009, Structure.

[19]  Lei Xie,et al.  Detecting evolutionary relationships across existing fold space, using sequence order-independent profile–profile alignments , 2008, Proceedings of the National Academy of Sciences.

[20]  SödingJohannes Protein homology detection by HMM--HMM comparison , 2005 .

[21]  R B Russell,et al.  Identification of distant homologues of fibroblast growth factors suggests a common ancestor for all beta-trefoil proteins. , 2000, Journal of molecular biology.

[22]  N. Grishin Fold change in evolution of protein structures. , 2001, Journal of structural biology.

[23]  N. Grishin,et al.  KH domain: one motif, two folds. , 2001, Nucleic acids research.

[24]  A. Biegert,et al.  Sequence context-specific profiles for homology searching , 2009, Proceedings of the National Academy of Sciences.

[25]  Andrei N Lupas,et al.  Gene duplication of the eight-stranded beta-barrel OmpX produces a functional pore: a scenario for the evolution of transmembrane beta-barrels. , 2007, Journal of molecular biology.

[26]  C. Orengo,et al.  One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. , 2002, Journal of molecular biology.

[27]  A. Murzin,et al.  Evolution of protein fold in the presence of functional constraints. , 2006, Current Opinion in Structural Biology.

[28]  C. Orengo,et al.  Protein families and their evolution-a structural perspective. , 2005, Annual review of biochemistry.

[29]  William R Taylor,et al.  Evolutionary transitions in protein fold space. , 2007, Current opinion in structural biology.

[30]  M G Rossmann,et al.  Comparison of super-secondary structures in proteins. , 1973, Journal of molecular biology.

[31]  Johannes Söding,et al.  HHrep: de novo protein repeat detection and the origin of TIM barrels , 2006, Nucleic Acids Res..

[32]  C. Orengo,et al.  Correlation of observed fold frequency with the occurrence of local structural motifs. , 1999, Journal of molecular biology.

[33]  Sung-Hou Kim,et al.  Global mapping of the protein structure space and application in structure-based inference of protein function. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Johannes Söding,et al.  HHomp—prediction and classification of outer membrane proteins , 2009, Nucleic Acids Res..

[35]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[36]  A. H. Wang,et al.  Structure, mechanism and function of prenyltransferases. , 2002, European journal of biochemistry.

[37]  Nick V. Grishin,et al.  MALISAM: a database of structurally analogous motifs in proteins , 2007, Nucleic Acids Res..

[38]  Angel R. Ortiz,et al.  Cross-Over between Discrete and Continuous Protein Structure Space: Insights into Automatic Classification and Networks of Protein Structures , 2009, PLoS Comput. Biol..

[39]  Andreas Prlic,et al.  SISYPHUS—structural alignments for proteins with non-trivial relationships , 2006, Nucleic Acids Res..

[40]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[41]  Andrei N Lupas,et al.  Cradle-loop barrels and the concept of metafolds in protein classification by natural descent. , 2008, Current opinion in structural biology.

[42]  R. Kolodny,et al.  Protein structure comparison: implications for the nature of 'fold space', and structure and function prediction. , 2006, Current opinion in structural biology.

[43]  Johannes Söding,et al.  The MPI Bioinformatics Toolkit for protein sequence analysis , 2006, Nucleic Acids Res..

[44]  Sung-Hou Kim,et al.  A global representation of the protein fold space , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[45]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[46]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[47]  David A. Lee,et al.  Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space , 2006, Nucleic acids research.

[48]  T. P. Flores,et al.  Identification and classification of protein fold families. , 1993, Protein engineering.