ECOD: An Evolutionary Classification of Protein Domains

Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or “fold”). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

[1]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[2]  P. Bork,et al.  Homology among (betaalpha)(8) barrels: implications for the evolution of metabolic pathways. , 2000, Journal of molecular biology.

[3]  Andrei N Lupas,et al.  Cradle-loop barrels and the concept of metafolds in protein classification by natural descent. , 2008, Current opinion in structural biology.

[4]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[5]  W. Awad,et al.  Crystal Structure of N-Glycosylated Human Glypican-1 Core Protein , 2012, The Journal of Biological Chemistry.

[6]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[7]  Liisa Holm,et al.  DaliLite workbench for protein structure comparison , 2000, Bioinform..

[8]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[9]  Min-Sung Kim,et al.  Structure of the protein core of the glypican Dally-like and localization of a region important for hedgehog signaling , 2011, Proceedings of the National Academy of Sciences.

[10]  Jennifer L. Martin,et al.  SAM (dependent) I AM: the S-adenosylmethionine-dependent methyltransferase fold. , 2002, Current opinion in structural biology.

[11]  J. Nathans,et al.  Insights into Wnt binding and signalling from the structures of two Frizzled cysteine-rich domains , 2001, Nature.

[12]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[13]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.

[14]  C. Orengo,et al.  One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. , 2002, Journal of molecular biology.

[15]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[16]  Steven E. Brenner,et al.  SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures , 2013, Nucleic Acids Res..

[17]  S. Eddy,et al.  Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions , 2013, Nucleic acids research.

[18]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[19]  C. Brenner,et al.  NAD+ metabolism in health and disease. , 2007, Trends in biochemical sciences.

[20]  Janusz M. Bujnicki,et al.  Comparison of protein structures reveals monophyletic origin of AdoMet-dependent methyltransferase family and mechanistic convergence rather than recent differentiation of N4-cytosine and N6-adenine DNA methylation , 1999, Silico Biol..

[21]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[22]  L. Aravind,et al.  Small but versatile: the extraordinary functional and structural diversity of the β-grasp fold , 2007, Biology Direct.

[23]  S. Hubbard,et al.  Crystal Structure of the Frizzled-like Cysteine-rich Domain of the Receptor Tyrosine Kinase Musk Nih Public Access Structure of Musk Fz-crd N-linked Glycosylation Comparison with Frizzled Crds Wnt Binding , 2022 .

[24]  Alexey G. Murzin,et al.  SCOP2 prototype: a new approach to protein structure mining , 2014, Nucleic Acids Res..

[25]  Frances M. G. Pearl,et al.  Review: what can structural classifications reveal about protein evolution? , 2001, Journal of structural biology.

[26]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[27]  J. Massagué,et al.  Features of a Smad3 MH1-DNA Complex , 2003, Journal of Biological Chemistry.

[28]  Johannes Söding,et al.  Evolution of the β‐propeller fold , 2008, Proteins.

[29]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[30]  Anna-Winona Struck,et al.  S‐Adenosyl‐Methionine‐Dependent Methyltransferases: Highly Versatile Enzymes in Biocatalysis, Biosynthesis and Other Biotechnological Applications , 2012, Chembiochem : a European journal of chemical biology.

[31]  Nick V Grishin,et al.  Discrimination between distant homologs and structural analogs: lessons from manually constructed, reliable data sets. , 2008, Journal of molecular biology.

[32]  Joseph L. Goldstein,et al.  Structure of N-Terminal Domain of NPC1 Reveals Distinct Subdomains for Binding and Transfer of Cholesterol , 2009, Cell.

[33]  R B Russell,et al.  Identification of distant homologues of fibroblast growth factors suggests a common ancestor for all beta-trefoil proteins. , 2000, Journal of molecular biology.

[34]  N. Grishin,et al.  Mh1 domain of Smad is a degraded homing endonuclease. , 2001, Journal of molecular biology.

[35]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[36]  Johannes Söding,et al.  AbrB-like transcription factors assume a swapped hairpin fold that is evolutionarily related to double-psi beta barrels. , 2005, Structure.

[37]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[38]  O. Ptitsyn,et al.  Why do globular proteins fit the limited set of folding patterns? , 1987, Progress in biophysics and molecular biology.

[39]  A M Lesk,et al.  NAD-binding domains of dehydrogenases. , 1995, Current opinion in structural biology.

[40]  Yuan Qi,et al.  SCOPmap: Automated assignment of protein structures to evolutionary superfamilies , 2004, BMC Bioinformatics.

[41]  J. Bazan,et al.  Structural Ties between Cholesterol Transport and Morphogen Signaling , 2009, Cell.

[42]  Ilya N. Shindyalov,et al.  PDP: protein domain parser , 2003, Bioinform..

[43]  E. Fauman Structure and evolution of AdoMet-dependent methyltransferase. , 1999 .

[44]  N. Grishin,et al.  Cysteine‐rich domains related to Frizzled receptors and Hedgehog‐interacting proteins , 2012, Protein science : a publication of the Protein Society.

[45]  N. Grishin,et al.  Structurally analogous proteins do exist! , 2004, Structure.

[46]  Narmada Thanki,et al.  CDD: a Conserved Domain Database for the functional annotation of proteins , 2010, Nucleic Acids Res..

[47]  Karen N. Allen,et al.  Evolutionary genomics of the HAD superfamily: understanding the structural adaptations and catalytic diversity in a superfamily of phosphoesterases and allied enzymes. , 2006, Journal of molecular biology.

[48]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[49]  H. Monaco,et al.  Crystal structure of chicken riboflavin‐binding protein , 1997, The EMBO journal.

[50]  Chen Chen,et al.  Structural basis for molecular recognition of folic acid by folate receptors , 2013, Nature.

[51]  Jimin Pei,et al.  AL2CO: calculation of positional conservation in a protein sequence alignment , 2001, Bioinform..

[52]  A. Murzin How far divergent evolution goes in proteins. , 1998, Current opinion in structural biology.

[53]  Michael Habeck,et al.  The GD box: A widespread noncontiguous supersecondary structural element , 2009, Protein science : a publication of the Protein Society.

[54]  Jian Ye,et al.  BLAST: improvements for better sequence analysis , 2006, Nucleic Acids Res..

[55]  J. Söding,et al.  More than the sum of their parts: On the evolution of proteins from peptides , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[56]  L. Aravind,et al.  The many faces of the helix-turn-helix domain: transcription regulation and beyond. , 2005, FEMS microbiology reviews.

[57]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[58]  Nick V. Grishin,et al.  HorA web server to infer homology between proteins using sequence and structural similarity , 2009, Nucleic Acids Res..

[59]  R. Blumenthal,et al.  Many paths to methyltransfer: a chronicle of convergence. , 2003, Trends in biochemical sciences.