Manual classification strategies in the ECOD database

ECOD (Evolutionary Classification Of protein Domains) is a comprehensive and up‐to‐date protein structure classification database. The majority of new structures released from the PDB (Protein Data Bank) each week already have close homologs in the ECOD hierarchy and thus can be reliably partitioned into domains and classified by software without manual intervention. However, those proteins that lack confidently detectable homologs require careful analysis by experts. Although many bioinformatics resources rely on expert curation to some degree, specific examples of how this curation occurs and in what cases it is necessary are not always described. Here, we illustrate the manual classification strategy in ECOD by example, focusing on two major issues in protein classification: domain partitioning and the relationship between homology and similarity scores. Most examples show recently released and manually classified PDB structures. We discuss multi‐domain proteins, discordance between sequence and structural similarities, difficulties with assessing homology with scores, and integral membrane proteins homologous to soluble proteins. By timely assimilation of newly available structures into its hierarchy, ECOD strives to provide a most accurate and updated view of the protein structure world as a result of combined computational and expert‐driven analysis. Proteins 2015; 83:1238–1251. © 2015 Wiley Periodicals, Inc.

[1]  Liisa Holm,et al.  Dali server: conservation mapping in 3D , 2010, Nucleic Acids Res..

[2]  U. Bonas,et al.  Xanthomonas AvrBs3 family-type III effectors: discovery and function. , 2010, Annual review of phytopathology.

[3]  R. Keller,et al.  The CHH-superfamily of multifunctional peptide hormones controlling crustacean metabolism, osmoregulation, moulting, and reproduction. , 2012, General and comparative endocrinology.

[4]  K. Nagata,et al.  The Solution Structure of Molt-inhibiting Hormone from the Kuruma Prawn Marsupenaeus japonicus * , 2003, The Journal of Biological Chemistry.

[5]  Jian-Kang Zhu,et al.  De novo-engineered transcription activator-like effector (TALE) hybrid nuclease with novel DNA binding specificity creates double-strand breaks , 2011, Proceedings of the National Academy of Sciences.

[6]  T. Shibata,et al.  Identification of the RecR Toprim Domain as the Binding Site for both RecF and RecO , 2006, Journal of Biological Chemistry.

[7]  J. Berger,et al.  Structure and function of an archaeal topoisomerase VI subunit with homology to the meiotic recombination factor Spo11 , 1999, The EMBO journal.

[8]  A. F. Neuwald An unexpected structural relationship between integral membrane phosphatases and soluble haloperoxidases , 1997, Protein science : a publication of the Protein Society.

[9]  J. Lakey,et al.  Disparate proteins use similar architectures to damage membranes. , 2008, Trends in biochemical sciences.

[10]  M. Graille,et al.  The structure of the NasR transcription antiterminator reveals a one‐component system with a NIT nitrate receptor coupled to an ANTAR RNA‐binding effector , 2012, Molecular microbiology.

[11]  Alex Bateman,et al.  The CHAP domain: a large family of amidases including GSP amidase and peptidoglycan hydrolases. , 2003, Trends in biochemical sciences.

[12]  Dmitrij Frishman,et al.  Current status of membrane protein structure classification , 2010, Proteins.

[13]  Byung Il Lee,et al.  Ring‐shaped architecture of RecR: implications for its role in homologous recombinational DNA repair , 2004, The EMBO journal.

[14]  She Chen,et al.  Glutamine Deamidation and Dysfunction of Ubiquitin/NEDD8 Induced by a Bacterial Effector Family , 2010, Science.

[15]  Philip Bradley,et al.  The Crystal Structure of TAL Effector PthXo1 Bound to Its DNA Target , 2012, Science.

[16]  M. Mulisch,et al.  DNA‐binding proteins of the Whirly family in Arabidopsis thaliana are targeted to the organelles , 2005, FEBS letters.

[17]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[18]  X. Su,et al.  Structure of the Type VI Effector-Immunity Complex (Tae4-Tai4) Provides Novel Insights into the Inhibition Mechanism of the Effector by Its Immunity Protein* , 2013, The Journal of Biological Chemistry.

[19]  N. Montagné,et al.  Molecular evolution of the crustacean hyperglycemic hormone family in ecdysozoans , 2010, BMC Evolutionary Biology.

[20]  E. Koonin,et al.  A superfamily of archaeal, bacterial, and eukaryotic proteins homologous to animal transglutaminases , 1999, Protein science : a publication of the Protein Society.

[21]  L. Simpson,et al.  A 100-kD complex of two RNA-binding proteins from mitochondria of Leishmania tarentolae catalyzes RNA annealing and interacts with several RNA editing components. , 2003, RNA.

[22]  D. Brömme Papain‐like Cysteine Proteases , 2000, Current protocols in protein science.

[23]  J. Sygusch,et al.  Crystal Structures of DNA-Whirly Complexes and Their Role in Arabidopsis Organelle Genome Repair[C][W] , 2010, Plant Cell.

[24]  I. Zhulin,et al.  ANTAR: an RNA-binding domain in transcription antitermination regulatory proteins. , 2002, Trends in biochemical sciences.

[25]  N. Grishin,et al.  Structurally analogous proteins do exist! , 2004, Structure.

[26]  S. Darst,et al.  Structure of the Escherichia coli RNA polymerase alpha subunit amino-terminal domain. , 1999, Science.

[27]  G. Friso,et al.  A member of the Whirly family is a multifunctional RNA- and DNA-binding protein that is essential for chloroplast biogenesis , 2008, Nucleic acids research.

[28]  Stephen H. Bryant,et al.  CD-Search: protein domain annotations on the fly , 2004, Nucleic Acids Res..

[29]  W. Cheng,et al.  Structural Insights into Ubiquinone Biosynthesis in Membranes , 2014, Science.

[30]  P. Gros,et al.  C-terminal domain of transcription cofactor PC4 reveals dimeric ssDNA binding site , 1997, Nature Structural Biology.

[31]  T. Steitz,et al.  Structure of Escherichia coli ribosomal protein L25 complexed with a 5S rRNA fragment at 1.8-A resolution. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[32]  W. Hunter,et al.  Biological Crystallography Structural Basis for Type Vi Secreted Peptidoglycan Dl-endopeptidase Function, Specificity and Neutralization in Serratia Marcescens , 2022 .

[33]  M. Parker,et al.  Pore-forming protein toxins: from structure to function. , 2005, Progress in biophysics and molecular biology.

[34]  A. H. Wang,et al.  Structure, mechanism and function of prenyltransferases. , 2002, European journal of biochemistry.

[35]  M. Jaskólski,et al.  Two polymorphs of a covalent complex between papain and a diazomethylketone inhibitor. , 2004, The journal of peptide research : official journal of the American Peptide Society.

[36]  G. Schulz,et al.  Isoprenoid biosynthesis: manifold chemistry catalyzed by similar enzymes. , 1998, Structure.

[37]  J. Sygusch,et al.  A conserved lysine residue of plant Whirly proteins is necessary for higher order protein assembly and protection against DNA damage , 2011, Nucleic acids research.

[38]  Johannes Söding,et al.  Evolution of the β‐propeller fold , 2008, Proteins.

[39]  D. Bushnell,et al.  Structural Basis of Transcription Nucleotide Selection by Rotation in the RNA Polymerase II Active Center , 2004, Cell.

[40]  M. Noble,et al.  Structure of arylamine N-acetyltransferase reveals a catalytic triad , 2000, Nature Structural Biology.

[41]  R. Peters,et al.  Terpenoid synthase structures: a so far incomplete view of complex catalysis. , 2012, Natural product reports.

[42]  J. Sygusch,et al.  A new family of plant transcription factors displays a novel ssDNA-binding surface , 2002, Nature Structural Biology.

[43]  SödingJohannes Protein homology detection by HMM--HMM comparison , 2005 .

[44]  Detlef D. Leipe,et al.  Toprim--a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins. , 1998, Nucleic acids research.

[45]  P. Liang Reaction kinetics, catalytic mechanisms, conformational changes, and inhibitor design for prenyltransferases. , 2009, Biochemistry.

[46]  G. King,et al.  Chemical Punch Packed in Venoms Makes Centipedes Excellent Predators* , 2012, Molecular & Cellular Proteomics.

[47]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[48]  Jianfeng Liu,et al.  Crystal structure of lipid phosphatase Escherichia coli phosphatidylglycerophosphate phosphatase B , 2014, Proceedings of the National Academy of Sciences.

[49]  Frances M. G. Pearl,et al.  Review: what can structural classifications reveal about protein evolution? , 2001, Journal of structural biology.

[50]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[51]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[52]  K. Stuart,et al.  RNA Interference Analyses Suggest a Transcript-specific Regulatory Role for Mitochondrial RNA-binding Proteins MRP1 and MRP2 in RNA Editing and Other RNA Processing in Trypanosoma brucei* , 2005, Journal of Biological Chemistry.

[53]  Vivek Anantharaman,et al.  Evolutionary history, structural features and biochemical diversity of the NlpC/P60 superfamily of enzymes , 2003, Genome Biology.

[54]  Dmitrii A. Polshakov,et al.  A New Approach to Protein Structure Mining and Alignment , 2004, BIOKDD.

[55]  F. G. van der Goot,et al.  Membrane injury by pore-forming proteins. , 2009, Current opinion in cell biology.

[56]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[57]  F. Taieb,et al.  Cycle Inhibiting Factors (CIFs) Are a Growing Family of Functional Cyclomodulins Present in Invertebrate and Mammal Bacterial Pathogens , 2009, PloS one.

[58]  D. Cane,et al.  Crystal structure of pentalenene synthase: mechanistic insights on terpenoid cyclization reactions in biology. , 1997, Science.

[59]  Sergej Djuranovic,et al.  Common evolutionary origin of swapped-hairpin and double-psi beta barrels. , 2006, Structure.

[60]  P. Berti,et al.  Alignment/phylogeny of the papain superfamily of cysteine proteases. , 1995, Journal of molecular biology.

[61]  M. Lu,et al.  5S rRNA断片と複合体形成した大腸菌リボソーム蛋白質L25の1.8Å分解能での構造 , 2000 .

[62]  Robert L Campbell,et al.  Anchored clathrate waters bind antifreeze proteins to ice , 2011, Proceedings of the National Academy of Sciences.

[63]  M. Selmer,et al.  Structure of the 70S Ribosome Complexed with mRNA and tRNA , 2006, Science.

[64]  Mark J. Banfield,et al.  Crystal Structures of Cif from Bacterial Pathogens Photorhabdus luminescens and Burkholderia pseudomallei , 2009, PloS one.

[65]  D. Hosfield,et al.  Structural Basis for Bisphosphonate-mediated Inhibition of Isoprenoid Biosynthesis* , 2004, Journal of Biological Chemistry.

[66]  A. Mondragón,et al.  Structure of a complex between E. coli DNA topoisomerase I and single-stranded DNA. , 2003, Structure.

[67]  Wei Yang Topoisomerases and site-specific recombinases: similarities in structure and mechanism , 2010, Critical reviews in biochemistry and molecular biology.

[68]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[69]  D. Goodlett,et al.  A widespread bacterial type VI secretion effector superfamily identified using a heuristic approach. , 2012, Cell host & microbe.

[70]  M. Selmer,et al.  Structure of ribosomal protein TL5 complexed with RNA provides new insights into the CTC family of stress proteins. , 2001, Acta crystallographica. Section D, Biological crystallography.

[71]  S. Darst,et al.  Structure of the Escherichia coli RNA Polymerase α Subunit Amino-Terminal Domain , 1998 .

[72]  J. Stülke,et al.  Cyclic Di-AMP Homeostasis in Bacillus subtilis , 2012, The Journal of Biological Chemistry.

[73]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[74]  Steven E. Brenner,et al.  SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures , 2013, Nucleic Acids Res..

[75]  N. Grishin,et al.  KH domain: one motif, two folds. , 2001, Nucleic acids research.

[76]  G. Montelione,et al.  Structures of domains I and IV from YbbR are representative of a widely distributed protein family , 2011, Protein science : a publication of the Protein Society.

[77]  J. Berger,et al.  Structural basis for gate-DNA recognition and bending by type IIA topoisomerases , 2007, Nature.

[78]  J. Dangl,et al.  A "Whirly" transcription factor is required for salicylic acid-dependent disease resistance in Arabidopsis. , 2004, Developmental cell.

[79]  Mirko Bischofberger,et al.  Structure and assembly of pore-forming proteins. , 2010, Current opinion in structural biology.

[80]  M. Schumacher,et al.  Crystal Structures of T. brucei MRP1/MRP2 Guide-RNA Binding Complex Reveal RNA Matchmaking Mechanism , 2006, Cell.

[81]  L. Pearl,et al.  Crystal structure and induction mechanism of AmiC–AmiR: a ligand‐regulated transcription antitermination complex , 1999, The EMBO journal.

[82]  M. Garber,et al.  Bacterial 5S rRNA-binding proteins of the CTC family , 2008, Biochemistry (Moscow).

[83]  C. McCowan,et al.  Recruitment and diversification of an ecdysozoan family of neuropeptide hormones for black widow spider venom expression. , 2014, Gene.

[84]  A. Gründling,et al.  Cyclic di-AMP: another second messenger enters the fray , 2013, Nature Reviews Microbiology.

[85]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[86]  M. Zhou,et al.  Structure of a Membrane-Embedded Prenyltransferase Homologous to UBIAD1 , 2014, PLoS biology.

[87]  David A. Lee,et al.  Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis , 2013, Nucleic Acids Res..

[88]  J. Sacchettini,et al.  Crystal structure of recombinant farnesyl diphosphate synthase at 2.6-A resolution. , 1994, Biochemistry.

[89]  Lvek,et al.  Evolution of protein structures and functions , 2022 .

[90]  F. Taieb,et al.  The molecular basis of ubiquitin-like protein NEDD8 deamidation by the bacterial effector protein Cif , 2012, Proceedings of the National Academy of Sciences.

[91]  L. J. Perry,et al.  The crystal and solution structure of a putative transcriptional antiterminator from Mycobacterium tuberculosis. , 2004, Structure.

[92]  A. Murzin How far divergent evolution goes in proteins. , 1998, Current opinion in structural biology.

[93]  Nick V. Grishin,et al.  HorA web server to infer homology between proteins using sequence and structural similarity , 2009, Nucleic Acids Res..

[94]  J. Berger,et al.  Structure of the RNA polymerase domain of E. coli primase. , 2000, Science.

[95]  J. Söding,et al.  Of Bits and Bugs — On the Use of Bioinformatics and a Bacterial Crystal Structure to Solve a Eukaryotic Repeat-Protein Structure , 2010, PloS one.

[96]  Nieng Yan,et al.  Structural Basis for Sequence-Specific Recognition of DNA by TAL Effectors , 2012, Science.

[97]  Richa Agarwala,et al.  COBALT: constraint-based alignment tool for multiple protein sequences , 2007, Bioinform..

[98]  Yuxing Liao,et al.  ECOD: An Evolutionary Classification of Protein Domains , 2014, PLoS Comput. Biol..

[99]  Narmada Thanki,et al.  CDD: a Conserved Domain Database for the functional annotation of proteins , 2010, Nucleic Acids Res..

[100]  Andrei N. Lupas,et al.  Common Evolutionary Origin of Swapped-Hairpin and Double-Psi β Barrels , 2006 .

[101]  Dierk Niessing,et al.  X-ray structure of Pur-α reveals a Whirly-like fold and an unusual nucleic-acid binding surface , 2009, Proceedings of the National Academy of Sciences.

[102]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..