Structural Bridges through Fold Space

Several protein structure classification schemes exist that partition the protein universe into structural units called folds. Yet these schemes do not discuss how these units sit relative to each other in a global structure space. In this paper we construct networks that describe such global relationships between folds in the form of structural bridges. We generate these networks using four different structural alignment methods across multiple score thresholds. The networks constructed using the different methods remain a similar distance apart regardless of the probability threshold defining a structural bridge. This suggests that at least some structural bridges are method specific and that any attempt to build a picture of structural space should not be reliant on a single structural superposition method. Despite these differences all representations agree on an organisation of fold space into five principal community structures: all-α, all-β sandwiches, all-β barrels, α/β and α + β. We project estimated fold ages onto the networks and find that not only are the pairings of unconnected folds associated with higher age differences than bridged folds, but this difference increases with the number of networks displaying an edge. We also examine different centrality measures for folds within the networks and how these relate to fold age. While these measures interpret the central core of fold space in varied ways they all identify the disposition of ancestral folds to fall within this core and that of the more recently evolved structures to provide the peripheral landscape. These findings suggest that evolutionary information is encoded along these structural bridges. Finally, we identify four highly central pivotal folds representing dominant topological features which act as key attractors within our landscapes.

[1]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[2]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[3]  Nir Ben-Tal,et al.  Representation of the Protein Universe using Classifications, Maps, and Networks , 2014 .

[4]  José Arcadio Farías-Rico,et al.  Evolutionary relationship of two ancient protein superfolds. , 2014, Nature chemical biology.

[5]  John Skvoretz,et al.  Node centrality in weighted networks: Generalizing degree and shortest paths , 2010, Soc. Networks.

[6]  Liisa Holm,et al.  Advances and pitfalls of protein structural alignment. , 2009, Current opinion in structural biology.

[7]  W R Taylor,et al.  On the evolutionary origins of "Fold Space Continuity": a study of topological convergence and divergence in mixed alpha-beta domains. , 2010, Journal of structural biology.

[8]  Sung-Hou Kim,et al.  Global mapping of the protein structure space and application in structure-based inference of protein function. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Alan R Davidson,et al.  A folding space odyssey , 2008, Proceedings of the National Academy of Sciences.

[10]  Golan Yona,et al.  Towards a Complete Map of the Protein Space Based on a Unified Sequence and Structure Analysis of All Known Proteins , 2000, ISMB.

[11]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[12]  Sergey Nepomnyachiy,et al.  Global view of the protein universe , 2014, Proceedings of the National Academy of Sciences.

[13]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[14]  Margarita Osadchy,et al.  Maps of protein structure space reveal a fundamental relationship between protein structure and function , 2011, Proceedings of the National Academy of Sciences.

[15]  Eugene I Shakhnovich,et al.  Expanding protein universe and its origin from the biological Big Bang , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Charles DeLisi,et al.  Functional fingerprints of folds: evidence for correlated structure-function evolution. , 2003, Journal of molecular biology.

[17]  Charlotte M. Deane,et al.  Exploring Fold Space Preferences of New-born and Ancient Protein Superfamilies , 2013, PLoS Comput. Biol..

[18]  Sung-Hou Kim,et al.  A global representation of the protein fold space , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[20]  Richard A Goldstein,et al.  The structure of protein evolution and the evolution of protein structure. , 2008, Current opinion in structural biology.

[21]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[22]  N Linial,et al.  ProtoMap: Automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space , 1999, Proteins.

[23]  Birte Höcker,et al.  Structural biology: A toolbox for protein design , 2012, Nature.

[24]  Charlotte M. Deane,et al.  How old is your fold? , 2005, ISMB.

[25]  A. Murzin,et al.  Evolution of protein fold in the presence of functional constraints. , 2006, Current opinion in structural biology.

[26]  R. Kolodny,et al.  Protein structure comparison: implications for the nature of 'fold space', and structure and function prediction. , 2006, Current opinion in structural biology.

[27]  William R. Taylor,et al.  Evolutionary inaccuracy of pairwise structural alignments , 2012, Bioinform..

[28]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[29]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[30]  Adam Godzik,et al.  Connecting the protein structure universe by using sparse recurring fragments. , 2005, Structure.

[31]  S. Eddy,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[32]  Michal Brylinski,et al.  The continuity of protein structure space is an intrinsic property of proteins , 2009, Proceedings of the National Academy of Sciences.

[33]  Ambuj K. Singh,et al.  Integrating multi-attribute similarity networks for robust representation of the protein space , 2006, Bioinform..

[34]  Nick V Grishin,et al.  Discrete-continuous duality of protein structure space. , 2009, Current opinion in structural biology.

[35]  Johannes Söding,et al.  A galaxy of folds , 2009, Protein science : a publication of the Protein Society.

[36]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[37]  Nick V. Grishin,et al.  Euclidian space and grouping of biological objects , 2002, Bioinform..

[38]  T. P. Flores,et al.  Identification and classification of protein fold families. , 1993, Protein engineering.

[39]  C. Ponting,et al.  The natural history of protein domains. , 2002, Annual review of biophysics and biomolecular structure.

[40]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[41]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[42]  Inbal Budowski-Tal,et al.  FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately , 2010, Proceedings of the National Academy of Sciences.

[43]  Angel R. Ortiz,et al.  Cross-Over between Discrete and Continuous Protein Structure Space: Insights into Automatic Classification and Networks of Protein Structures , 2009, PLoS Comput. Biol..

[44]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[45]  Sung-Hou Kim,et al.  Evolution of protein structural classes and protein sequence families , 2006, Proceedings of the National Academy of Sciences.

[46]  Xian-Wu Zou,et al.  The architectonic fold similarity network in protein fold space , 2006 .

[47]  Barry Honig,et al.  Is protein classification necessary? Toward alternative approaches to function annotation. , 2009, Current opinion in structural biology.

[48]  D. Hilvert,et al.  Protein design by directed evolution. , 2008, Annual review of biophysics.

[49]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[50]  C. Sander,et al.  A database of protein structure families with common folding motifs , 1992, Protein science : a publication of the Protein Society.

[51]  Konstantina S. Nikita,et al.  A comparative study of multi-classification methods for protein fold recognition , 2010, CI 2010.

[52]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[53]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[54]  Alexey G. Murzin,et al.  SCOP2 prototype: a new approach to protein structure mining , 2014, Nucleic Acids Res..

[55]  P. Røgen,et al.  Automatic classification of protein structure by using Gauss integrals , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[56]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[57]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[58]  William R Taylor,et al.  Protein structures, folds and fold spaces , 2010, Journal of physics. Condensed matter : an Institute of Physics journal.

[59]  Wei Liu,et al.  A Mathematical Framework for Protein Structure Comparison , 2011, PLoS Comput. Biol..

[60]  Scott Federhen,et al.  The NCBI Taxonomy database , 2011, Nucleic Acids Res..

[61]  Rahul Singh,et al.  Automatic classification of protein structures using low-dimensional structure space mappings , 2014, BMC Bioinformatics.