Biocuration in the structure–function linkage database: the anatomy of a superfamily

Abstract With ever-increasing amounts of sequence data available in both the primary literature and sequence repositories, there is a bottleneck in annotating molecular function to a sequence. This article describes the biocuration process and methods used in the structure-function linkage database (SFLD) to help address some of the challenges. We discuss how the hierarchy within the SFLD allows us to infer detailed functional properties for functionally diverse enzyme superfamilies in which all members are homologous, conserve an aspect of their chemical function and have associated conserved structural features that enable the chemistry. Also presented is the Enzyme Structure-Function Ontology (ESFO), which has been designed to capture the relationships between enzyme sequence, structure and function that underlie the SFLD and is used to guide the biocuration processes within the SFLD. Database URL: http://sfld.rbvi.ucsf.edu/

[1]  D. Herschlag,et al.  Catalytic promiscuity and the evolution of new enzymatic activities. , 1999, Chemistry & biology.

[2]  Richard N. Armstrong,et al.  Large-Scale Determination of Sequence, Structure, and Function Relationships in Cytosolic Glutathione Transferases across the Biosphere , 2014, PLoS biology.

[3]  Liisa Holm,et al.  PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment , 2015, Bioinform..

[4]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[5]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[6]  Patricia C Babbitt,et al.  The evolution of function in strictosidine synthase‐like proteins , 2011, Proteins.

[7]  S. Copley,et al.  An evolutionary perspective on protein moonlighting. , 2014, Biochemical Society transactions.

[8]  Duncan P. Brown,et al.  Automated Protein Subfamily Identification and Classification , 2007, PLoS Comput. Biol..

[9]  Neil D. Rawlings,et al.  Creating a specialist protein resource network: a meeting report for the protein bioinformatics and community resources retreat , 2015, Database J. Biol. Databases Curation.

[10]  Karen N. Allen,et al.  Evolutionary genomics of the HAD superfamily: understanding the structural adaptations and catalytic diversity in a superfamily of phosphoesterases and allied enzymes. , 2006, Journal of molecular biology.

[11]  Haruki Nakamura,et al.  The archiving and dissemination of biological structure data. , 2016, Current opinion in structural biology.

[12]  Geng-Ming Hu,et al.  Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions. , 2016, Journal of proteome research.

[13]  Bernard Henrissat,et al.  Key challenges for the creation and maintenance of specialist protein resources , 2015, Proteins.

[14]  Heidi J. Imker,et al.  Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks. , 2015, Biochimica et biophysica acta.

[15]  David A. Lee,et al.  GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains , 2009, Nucleic acids research.

[16]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[17]  David A. Lee,et al.  Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis , 2013, Nucleic Acids Res..

[18]  Conrad C. Huang,et al.  Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. , 2006, Biochemistry.

[19]  Patricia C. Babbitt,et al.  Pythoscape: a framework for generation of large protein similarity networks , 2012, Bioinform..

[20]  Huaiyu Mi,et al.  The InterPro protein families database: the classification resource after 15 years , 2014, Nucleic Acids Res..

[21]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[22]  Tsuyoshi Kato,et al.  EzCatDB: the enzyme reaction database, 2015 update , 2014, Nucleic Acids Res..

[23]  Prudence Mutowo-Meullenet,et al.  The GOA database: Gene Ontology annotation updates for 2015 , 2014, Nucleic Acids Res..

[24]  Gemma L. Holliday,et al.  Characterizing the complexity of enzymes on the basis of their mechanisms and structures with a bio-computational analysis , 2011, The FEBS journal.

[25]  Joel L. Sussman,et al.  The α/β hydrolase fold , 1992 .

[26]  Gemma L. Holliday,et al.  MACiE: exploring the diversity of biochemical reactions , 2011, Nucleic Acids Res..

[27]  Nick V. Grishin,et al.  Pclust: protein network visualization highlighting experimental data , 2013, Bioinform..

[28]  G L Kenyon,et al.  Mechanism of the reaction catalyzed by mandelate racemase: structure and mechanistic properties of the D270N mutant. , 1995, Biochemistry.

[29]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[30]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[31]  Ute Baumann,et al.  Estimating the annotation error rate of curated GO database sequence annotations , 2007, BMC Bioinformatics.

[32]  Michael A. Hicks,et al.  The Structure–Function Linkage Database , 2006, Nucleic Acids Res..

[33]  Janelle B. Leuthaeuser,et al.  Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity , 2015, Protein science : a publication of the Protein Society.

[34]  Dan S. Tawfik,et al.  Enzyme promiscuity: a mechanistic and evolutionary perspective. , 2010, Annual review of biochemistry.

[35]  Patricia C. Babbitt,et al.  New Insights about Enzyme Evolution from Large Scale Studies of Sequence and Structure Relationships* , 2014, The Journal of Biological Chemistry.

[36]  Eric Bapteste,et al.  EGN: a wizard for construction of gene and genome similarity networks , 2013, BMC Evolutionary Biology.

[37]  I-Min A. Chen,et al.  IMG 4 version of the integrated microbial genomes comparative analysis system , 2013, Nucleic Acids Res..

[38]  Konstantina S. Nikita,et al.  A similarity network approach for the analysis and comparison of protein sequence/structure sets , 2010, J. Biomed. Informatics.

[39]  Ian Sillitoe,et al.  Extending CATH: increasing coverage of the protein structure universe and linking structure with function , 2010, Nucleic Acids Res..

[40]  Fangfang Xia,et al.  The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST) , 2013, Nucleic Acids Res..

[41]  Janet M. Thornton,et al.  The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes , 2013, Nucleic Acids Res..

[42]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[43]  Erin Beck,et al.  TIGRFAMs and Genome Properties in 2013 , 2012, Nucleic Acids Res..

[44]  Robert D. Finn,et al.  DUFs: families in search of function , 2010, Acta crystallographica. Section F, Structural biology and crystallization communications.

[45]  G L Kenyon,et al.  Mechanism of the reaction catalyzed by mandelate racemase. 1. Chemical and kinetic evidence for a two-base mechanism. , 1991, Biochemistry.

[46]  Patricia C. Babbitt,et al.  Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space , 2013, PLoS Comput. Biol..

[47]  Anushya Muruganujan,et al.  PANTHER version 10: expanded protein families and functions, and analysis tools , 2015, Nucleic Acids Res..

[48]  P. Babbitt,et al.  Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies. , 2001, Annual review of biochemistry.

[49]  Shoshana D. Brown,et al.  A gold standard set of mechanistically diverse enzyme superfamilies , 2006, Genome Biology.

[50]  G. H. Reed,et al.  The enolase superfamily: a general strategy for enzyme-catalyzed abstraction of the alpha-protons of carboxylic acids. , 1996, Biochemistry.

[51]  Elisabeth R. M. Tillier,et al.  The accuracy of several multiple sequence alignment programs for proteins , 2006, BMC Bioinformatics.

[52]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[53]  Dannie Durand,et al.  Sequence Similarity Network Reveals Common Ancestry of Multidomain Proteins , 2008, PLoS Comput. Biol..

[54]  Eduardo Corel,et al.  Network-Thinking: Graphs to Analyze Microbial Complexity and Evolution , 2016, Trends in microbiology.

[55]  Conrad C. Huang,et al.  Representing Structure-Function Relationships in Mechanistically Diverse Enzyme Superfamilies , 2004, Pacific Symposium on Biocomputing.

[56]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[57]  Thomas E. Ferrin,et al.  Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies , 2009, PloS one.

[58]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[59]  Silvio C. E. Tosatto,et al.  InterPro in 2017—beyond protein family and domain annotations , 2016, Nucleic Acids Res..

[60]  Tadashi Eguchi,et al.  Characterization of a radical S-adenosyl-L-methionine epimerase, NeoN, in the last step of neomycin B biosynthesis. , 2014, Journal of the American Chemical Society.

[61]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[62]  S J Remington,et al.  The alpha/beta hydrolase fold. , 1992, Protein engineering.