Representation of the Protein Universe using Classifications, Maps, and Networks

A meaningful and coherent global picture of the protein universe is needed to better understand protein evo- lution and the underlying biophysics. We survey the studies that tackled this fundamental challenge, providing a glimpse of the protein space. A global picture represents all known local relationships among proteins, and needs to do so in a comprehensive and accurate manner. Three types of global representations can be used: classifications, maps, and networks. In these, the local relationships are derived, based on the similarity of the proteins' sequences, struc- tures, or functions (or a combination of these). Alternatively, the local relationships can be co-occurrences of elements in the protein universe. The representations can be based on different objects: full polypeptide chains, fragments, such as structural domains, or even smaller motifs. Different protein qualities were revealed in each study; many point out the uniqueness of domains of the alpha/beta SCOP (structural classification of proteins) class.

[1]  Gary D Bader,et al.  A travel guide to Cytoscape plugins , 2012, Nature Methods.

[2]  Nathan Linial,et al.  ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree , 2011, Nucleic Acids Res..

[3]  Sung-Hou Kim,et al.  A global representation of the protein fold space , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[5]  David G. Stork,et al.  Pattern Classification , 1973 .

[6]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[7]  Woei-Jyh Lee,et al.  Evaluation of domain prediction in CASP6 , 2005, Proteins.

[8]  S. Teichmann,et al.  The evolution of domain arrangements in proteins and interaction networks , 2005, Cellular and Molecular Life Sciences CMLS.

[9]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[10]  Inbal Budowski-Tal,et al.  FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately , 2010, Proceedings of the National Academy of Sciences.

[11]  Bonnie Berger,et al.  Global alignment of multiple protein interaction networks with application to functional orthology detection , 2008, Proceedings of the National Academy of Sciences.

[12]  M. Levitt Nature of the protein universe , 2009, Proceedings of the National Academy of Sciences.

[13]  Rolf Apweiler,et al.  Improvements to CluSTr: the database of SWISS-PROT+TrEMBL protein clusters , 2003, Nucleic Acids Res..

[14]  Sung-Hou Kim,et al.  Global mapping of the protein structure space and application in structure-based inference of protein function. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Ori Sasson,et al.  ProtoNet: hierarchical classification of the protein space , 2003, Nucleic Acids Res..

[16]  Kimberly Van Auken,et al.  WormBase: a multi-species resource for nematode biology and genomics , 2004, Nucleic Acids Res..

[17]  Manfred J. Sippl,et al.  On distance and similarity in fold space , 2008, Bioinform..

[18]  S. Teichmann,et al.  Domain combinations in archaeal, eubacterial and eukaryotic proteomes. , 2001, Journal of molecular biology.

[19]  Michael Lappe,et al.  A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3 , 2001, Nucleic Acids Res..

[20]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[21]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[22]  Jana Fuhrmann,et al.  Reviews In Computational Chemistry , 2016 .

[23]  Xian-Wu Zou,et al.  The architectonic fold similarity network in protein fold space , 2006 .

[24]  Barry Honig,et al.  Is protein classification necessary? Toward alternative approaches to function annotation. , 2009, Current opinion in structural biology.

[25]  Ori Sasson,et al.  ProtoNet 4.0: A hierarchical classification of one million protein sequences , 2004, Nucleic Acids Res..

[26]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[27]  Joshua B. Tenenbaum,et al.  Sparse multidimensional scaling using land-mark points , 2004 .

[28]  Michal Linial,et al.  A functional hierarchical organization of the protein sequence space , 2004, BMC Bioinformatics.

[29]  Charlotte M. Deane,et al.  How old is your fold? , 2005, ISMB.

[30]  Martin Vingron,et al.  Large scale hierarchical clustering of protein sequences , 2005, BMC Bioinformatics.

[31]  Sarah A. Teichmann,et al.  An insight into domain combinations , 2001, ISMB.

[32]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[33]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[34]  Michal Brylinski,et al.  The continuity of protein structure space is an intrinsic property of proteins , 2009, Proceedings of the National Academy of Sciences.

[35]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[36]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[37]  Markus Gruber,et al.  COPS—a novel workbench for explorations in fold space , 2009, Nucleic Acids Res..

[38]  Judy Qiu,et al.  Proceedings of the second international workshop on Emerging computational methods for the life sciences , 2011, HPDC 2011.

[39]  Eugene I Shakhnovich,et al.  Expanding protein universe and its origin from the biological Big Bang , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Florencio Pazos,et al.  Gene ontology functional annotations at the structural domain level , 2009, Proteins.

[41]  Sarah A. Teichmann,et al.  Protein domain organisation: adding order , 2009, BMC Bioinformatics.

[42]  Angel R. Ortiz,et al.  Cross-Over between Discrete and Continuous Protein Structure Space: Insights into Automatic Classification and Networks of Protein Structures , 2009, PLoS Comput. Biol..

[43]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[44]  Nick V. Grishin,et al.  Euclidian space and grouping of biological objects , 2002, Bioinform..

[45]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[46]  Edward M Marcotte,et al.  LGL: creating a map of protein function with an algorithm for visualizing very large biological networks. , 2004, Journal of molecular biology.

[47]  Konstantina S. Nikita,et al.  A similarity network approach for the analysis and comparison of protein sequence/structure sets , 2010, J. Biomed. Informatics.

[48]  Alfonso Valencia,et al.  Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology , 1996 .

[49]  S. Wuchty Scale-free behavior in protein domain networks. , 2001, Molecular biology and evolution.

[50]  Burkhard Rost,et al.  Domains, motifs and clusters in the protein universe. , 2003, Current opinion in chemical biology.

[51]  Jason Weston,et al.  Protein ranking: from local to global structure in the protein similarity network. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Robert Petryszak,et al.  The predictive power of the CluSTr database , 2005, Bioinform..

[53]  Dannie Durand,et al.  Graph Theoretical Insights into Evolution of Multidomain Proteins , 2005, RECOMB.

[54]  Stella Veretnik,et al.  Partitioning protein structures into domains: why is it so difficult? , 2006, Journal of molecular biology.

[55]  T. P. Flores,et al.  Identification and classification of protein fold families. , 1993, Protein engineering.

[56]  Patrice Koehl,et al.  Protein Structure Classification , 2006 .

[57]  Charles DeLisi,et al.  Functional fingerprints of folds: evidence for correlated structure-function evolution. , 2003, Journal of molecular biology.

[58]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[59]  Anton J. Enright,et al.  Classification schemes for protein structure and function , 2003, Nature Reviews Genetics.

[60]  François Stricher,et al.  BriX: a database of protein building blocks for structural analysis, modeling and design , 2010, Nucleic Acids Res..

[61]  P. Koehl,et al.  Protein structure similarities. , 2001, Current opinion in structural biology.

[62]  Margarita Osadchy,et al.  Maps of protein structure space reveal a fundamental relationship between protein structure and function , 2011, Proceedings of the National Academy of Sciences.

[63]  William R Taylor,et al.  Evolutionary transitions in protein fold space. , 2007, Current opinion in structural biology.

[64]  Sung-Hou Kim,et al.  Evolution of protein structural classes and protein sequence families , 2006, Proceedings of the National Academy of Sciences.

[65]  G. Vriend,et al.  Homology modeling. , 2020, Methods of biochemical analysis.

[66]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[67]  Chris Sander,et al.  The FSSP database: fold classification based on structure-structure alignment of proteins , 1996, Nucleic Acids Res..

[68]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[69]  Michael Levitt,et al.  On the universe of protein folds. , 2013, Annual review of biophysics.

[70]  Richard A Goldstein,et al.  The structure of protein evolution and the evolution of protein structure. , 2008, Current opinion in structural biology.

[71]  R. Kolodny,et al.  Protein structure comparison: implications for the nature of 'fold space', and structure and function prediction. , 2006, Current opinion in structural biology.

[72]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[73]  Adam Godzik,et al.  Connecting the protein structure universe by using sparse recurring fragments. , 2005, Structure.

[74]  P. Røgen,et al.  Automatic classification of protein structure by using Gauss integrals , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[75]  Peer Bork,et al.  SMART 4.0: towards genomic data integration , 2004, Nucleic Acids Res..

[76]  N Linial,et al.  ProtoMap: Automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space , 1999, Proteins.

[77]  Ryan Day,et al.  A consensus view of fold space: Combining SCOP, CATH, and the Dali Domain Dictionary , 2003, Protein science : a publication of the Protein Society.

[78]  Huafeng Xu,et al.  Exploring the nonlinear geometry of protein homology , 2003, Protein science : a publication of the Protein Society.

[79]  Johannes Söding,et al.  A galaxy of folds , 2009, Protein science : a publication of the Protein Society.

[80]  Budd Evolutionary Genomics , 2012, Methods in Molecular Biology.

[81]  Hai Fang,et al.  dcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more , 2012, Nucleic Acids Res..

[82]  Ambuj K. Singh,et al.  Integrating multi-attribute similarity networks for robust representation of the protein space , 2006, Bioinform..

[83]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[84]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[85]  Terri K. Attwood,et al.  The PRINTS protein fingerprint database in its fifth year , 1998, Nucleic Acids Res..