Navigating Among Known Structures in Protein Space.

Present-day protein space is the result of 3.7 billion years of evolution, constrained by the underlying physicochemical qualities of the proteins. It is difficult to differentiate between evolutionary traces and effects of physicochemical constraints. Nonetheless, as a rule of thumb, instances of structural reuse, or focusing on structural similarity, are likely attributable to physicochemical constraints, whereas sequence reuse, or focusing on sequence similarity, may be more indicative of evolutionary relationships. Both types of relationships have been studied and can provide meaningful insights to protein biophysics and evolution, which in turn can lead to better algorithms for protein search, annotation, and maybe even design.In broad strokes, studies of protein space vary in the entities they represent, the similarity measure comparing these entities, and the representation used. The entities can be, for example, protein chains, domains, supra-domains, or smaller protein sub-parts denoted themes. The measures of similarity between the entities can be based on sequence, structure, function, or any combination of these. The representation can be global, encompassing the whole space, or local, focusing on a particular region surrounding protein(s) of interest. Global representations include lists of grouped proteins, protein networks, and maps. Networks are the abstraction that is derived most directly from the similarity data: each node is the protein entity (e.g., a domain), and edges connect similar domains. Selecting the entities, the similarity measure, and the abstraction are three intertwined decisions: the similarity measures allow us to identify the entities, and the selection of entities influences what is a meaningful similarity measure. Similarly, we seek entities that are related to each other in a way, for which a simple representation describes their relationships succinctly and accurately. This chapter will cover studies that rely on different entities, similarity measures, and a range of representations to better understand protein structure space. Scholars may use publicly available navigators offering a global representation, and in particular the hierarchical classifications SCOP, CATH, and ECOD, or a local representation, which encompass structural alignment algorithms. Alternatively, scholars can configure their own navigator using existing tools. To demonstrate this DIY (do it yourself) approach for navigating in protein space, we investigate substrate-binding proteins. By presenting sequence similarities among this large and diverse protein family as a network, we can infer that one member (pdb ID 4ntl; of yet unknown function) may bind methionine and suggest a putative binding mechanism.

[1]  John H. Morris,et al.  structureViz: linking Cytoscape and UCSF Chimera , 2007, Bioinform..

[2]  Michael J E Sternberg,et al.  PhyreStorm: A Web Server for Fast Structural Searches Against the PDB , 2015, Journal of molecular biology.

[3]  Lei Xie,et al.  Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling , 2003, Proteins.

[4]  M. Levitt,et al.  Small libraries of protein fragments model native protein structures accurately. , 2002, Journal of molecular biology.

[5]  S. Eddy,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[6]  Michal Brylinski,et al.  The continuity of protein structure space is an intrinsic property of proteins , 2009, Proceedings of the National Academy of Sciences.

[7]  Yaoqi Zhou,et al.  DDOMAIN: Dividing structures into domains using a normalized domain–domain interaction profile , 2007, Protein science : a publication of the Protein Society.

[8]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[9]  Margarita Osadchy,et al.  Maps of protein structure space reveal a fundamental relationship between protein structure and function , 2011, Proceedings of the National Academy of Sciences.

[10]  Eugene I Shakhnovich,et al.  Expanding protein universe and its origin from the biological Big Bang , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Sergey Nepomnyachiy,et al.  ConTemplate Suggests Possible Alternative Conformations for a Query Protein of Known Structure. , 2015, Structure.

[12]  D. Goodsell,et al.  Visualization of macromolecular structures , 2010, Nature Methods.

[13]  Johannes Söding,et al.  A galaxy of folds , 2009, Protein science : a publication of the Protein Society.

[14]  José Arcadio Farías-Rico,et al.  Evolutionary relationship of two ancient protein superfolds. , 2014, Nature chemical biology.

[15]  C. Orengo,et al.  Protein Structure Classification , 2015 .

[16]  Lenore Cowen,et al.  Matt: Local Flexibility Aids Protein Multiple Structure Alignment , 2008, PLoS Comput. Biol..

[17]  Sung-Hou Kim,et al.  Global extent of horizontal gene transfer , 2007, Proceedings of the National Academy of Sciences.

[18]  Tim J. P. Hubbard,et al.  SCOP: a Structural Classification of Proteins database , 1999, Nucleic Acids Res..

[19]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[20]  Michael Levitt,et al.  On the universe of protein folds. , 2013, Annual review of biophysics.

[21]  Ben M. Webb,et al.  ModBase, a database of annotated comparative protein structure models and associated resources , 2013, Nucleic Acids Res..

[22]  François Stricher,et al.  PepX: a structural database of non-redundant protein–peptide complexes , 2009, Nucleic Acids Res..

[23]  Eugene I Shakhnovich,et al.  Understanding protein evolution: from protein physics to Darwinian selection. , 2008, Annual review of physical chemistry.

[24]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[25]  C. Ponting,et al.  On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? , 2001, Journal of structural biology.

[26]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[27]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[28]  R. Kolodny,et al.  Sequence-similar, structure-dissimilar protein pairs in the PDB , 2007, Proteins.

[29]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[30]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[31]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[32]  Sean R Eddy,et al.  A new generation of homology search tools based on probabilistic inference. , 2009, Genome informatics. International Conference on Genome Informatics.

[33]  Narmada Thanki,et al.  CDD: a Conserved Domain Database for the functional annotation of proteins , 2010, Nucleic Acids Res..

[34]  Alexey G. Murzin,et al.  SCOP2 prototype: a new approach to protein structure mining , 2014, Nucleic Acids Res..

[35]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[36]  Inbal Budowski-Tal,et al.  FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately , 2010, Proceedings of the National Academy of Sciences.

[37]  Sergey Nepomnyachiy,et al.  CyToStruct: Augmenting the Network Visualization of Cytoscape with the Power of Molecular Viewers. , 2015, Structure.

[38]  Steven E. Brenner,et al.  SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures , 2013, Nucleic Acids Res..

[39]  Angel R. Ortiz,et al.  Cross-Over between Discrete and Continuous Protein Structure Space: Insights into Automatic Classification and Networks of Protein Structures , 2009, PLoS Comput. Biol..

[40]  Michael Levitt,et al.  Redundancy-weighting for better inference of protein structural features , 2014, Bioinform..

[41]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[42]  Gevorg Grigoryan,et al.  Tertiary alphabet for the observable protein structural universe , 2016, Proceedings of the National Academy of Sciences.

[43]  Zejun Zheng,et al.  Basic units of protein structure, folding, and function. , 2017, Progress in biophysics and molecular biology.

[44]  Jinn-Moon Yang,et al.  Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database , 2007, Genome Biology.

[45]  Sung-Hou Kim,et al.  Evolution of protein structural classes and protein sequence families , 2006, Proceedings of the National Academy of Sciences.

[46]  Valerie Daggett,et al.  Generation of a consensus protein domain dictionary , 2011, Bioinform..

[47]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[48]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[49]  Nathan Linial,et al.  Approximate protein structural alignment in polynomial time. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[50]  M. Levitt,et al.  Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core , 1993, Current Biology.

[51]  Sergey Nepomnyachiy,et al.  Global view of the protein universe , 2014, Proceedings of the National Academy of Sciences.

[52]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.

[53]  Frances M. G. Pearl,et al.  CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures , 2007, PLoS Comput. Biol..

[54]  Oliviero Carugo Recent progress in measuring structural similarity between proteins. , 2007, Current protein & peptide science.

[55]  Stella Veretnik,et al.  Partitioning protein structures into domains: why is it so difficult? , 2006, Journal of molecular biology.

[56]  M. Sternberg,et al.  Partial protein domains: evolutionary insights and bioinformatics challenges , 2015, Genome Biology.

[57]  Alexander S. Rose,et al.  NGL Viewer: a web application for molecular visualization , 2015, Nucleic Acids Res..

[58]  R. Kolodny,et al.  Protein structure comparison: implications for the nature of 'fold space', and structure and function prediction. , 2006, Current opinion in structural biology.

[59]  Charlotte M. Deane,et al.  Structural Bridges through Fold Space , 2015, PLoS Comput. Biol..

[60]  S. Teichmann,et al.  Supra-domains: evolutionary units larger than single protein domains. , 2004, Journal of molecular biology.

[61]  Gary D Bader,et al.  A travel guide to Cytoscape plugins , 2012, Nature Methods.

[62]  T. P. Flores,et al.  Identification and classification of protein fold families. , 1993, Protein engineering.

[63]  Ilya N. Shindyalov,et al.  PDP: protein domain parser , 2003, Bioinform..

[64]  C. Ponting,et al.  The natural history of protein domains. , 2002, Annual review of biophysics and biomolecular structure.

[65]  Andreas Prlic,et al.  Pre-calculated protein structure alignments at the RCSB PDB website , 2010, Bioinform..

[66]  András Fiser,et al.  Structural Characteristics of Novel Protein Folds , 2010, PLoS Comput. Biol..

[67]  Patrice Koehl,et al.  Protein Structure Classification , 2006 .

[68]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[69]  Dachuan Zhang,et al.  MMDB and VAST+: tracking structural similarities between macromolecular complexes , 2013, Nucleic Acids Res..

[70]  Ryan Day,et al.  A consensus view of fold space: Combining SCOP, CATH, and the Dali Domain Dictionary , 2003, Protein science : a publication of the Protein Society.

[71]  Nick V. Grishin,et al.  Structural drift: a possible path to protein fold change , 2005, Bioinform..

[72]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[73]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[74]  Liisa Holm,et al.  Advances and pitfalls of protein structural alignment. , 2009, Current opinion in structural biology.

[75]  Lorenz Wernisch,et al.  Identifying structural domains in proteins. , 2005, Methods of biochemical analysis.

[76]  Tal Pupko,et al.  Structural Genomics , 2005 .

[77]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[78]  Lutz Schmitt,et al.  A structural classification of substrate‐binding proteins , 2010, FEBS letters.

[79]  Nir Ben-Tal,et al.  Representation of the Protein Universe using Classifications, Maps, and Networks , 2014 .

[80]  B. Tjaden,et al.  De novo assembly of bacterial transcriptomes from RNA-seq data , 2015, Genome Biology.

[81]  Ralf Zimmer,et al.  Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis , 2009, BMC Structural Biology.

[82]  Emden R. Gansner,et al.  Graphviz - Open Source Graph Drawing Tools , 2001, GD.

[83]  Itay Mayrose,et al.  ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules , 2016, Nucleic Acids Res..

[84]  J. Skolnick,et al.  On the role of physics and evolution in dictating protein structure and function. , 2014, Israel journal of chemistry.

[85]  Karl Frank,et al.  Structure-Based Characterization of Multiprotein Complexes , 2014, Structure.

[86]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[87]  E N Trifonov,et al.  [Evolutionary aspects of protein structure and folding]. , 2001, Molekuliarnaia biologiia.

[88]  J. Söding,et al.  A vocabulary of ancient peptides at the origin of folded proteins , 2015, eLife.

[89]  Fred P. Davis,et al.  PIBASE: a comprehensive database of structurally defined protein interfaces , 2005, Bioinform..

[90]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[91]  François Stricher,et al.  BriX: a database of protein building blocks for structural analysis, modeling and design , 2010, Nucleic Acids Res..

[92]  Yuxing Liao,et al.  ECOD: An Evolutionary Classification of Protein Domains , 2014, PLoS Comput. Biol..

[93]  R. Kolodny,et al.  Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths , 2017, Proceedings of the National Academy of Sciences.