Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity

The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity‐based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM‐Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well‐studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence‐ or structure‐based networks. Sequence‐ and structure‐based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity‐based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity‐based methods.

[1]  Marc A. Martí-Renom,et al.  The AnnoLite and AnnoLyze programs for comparative annotation of protein structures , 2007, BMC Bioinformatics.

[2]  Barry Honig,et al.  Is protein classification necessary? Toward alternative approaches to function annotation. , 2009, Current opinion in structural biology.

[3]  A Bairoch,et al.  Go hunting in sequence databases but watch out for the traps. , 1996, Trends in genetics : TIG.

[4]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[5]  M. Wilce,et al.  The crystal structures of glutathione S‐transferases isozymes 1–3 and 1–4 from Anopheles dirus species B , 2001, Protein science : a publication of the Protein Society.

[6]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[7]  P D Karp,et al.  What we do not know about sequence analysis and sequence databases. , 1998, Bioinformatics.

[8]  Alfonso Valencia,et al.  The Ras protein superfamily: Evolutionary tree and role of conserved amino acids , 2012, Journal of Cell Biology.

[9]  Ying Zhang,et al.  The FGGY Carbohydrate Kinase Family: Insights into the Evolution of Functional Specificities , 2011, PLoS Comput. Biol..

[10]  Janusz M. Bujnicki,et al.  Molecular evolution of dihydrouridine synthases , 2012, BMC Bioinformatics.

[11]  Dorothea Emig,et al.  Partitioning biological data with transitivity clustering , 2010, Nature Methods.

[12]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[13]  R. Knight,et al.  MotifCluster: an interactive online tool for clustering and visualizing sequences using shared motifs , 2008, Genome Biology.

[14]  Gary D. Bader,et al.  clusterMaker: a multi-algorithm clustering plugin for Cytoscape , 2011, BMC Bioinformatics.

[15]  Richard N. Armstrong,et al.  Large-Scale Determination of Sequence, Structure, and Function Relationships in Cytosolic Glutathione Transferases across the Biosphere , 2014, PLoS biology.

[16]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[17]  Sarah A Teichmann,et al.  Novel specificities emerge by stepwise duplication of functional modules. , 2005, Genome research.

[18]  B. Sewell,et al.  The role of a topologically conserved isoleucine in glutathione transferase structure, stability and function. , 2010, Acta crystallographica. Section F, Structural biology and crystallization communications.

[19]  Edward M Marcotte,et al.  LGL: creating a map of protein function with an algorithm for visualizing very large biological networks. , 2004, Journal of molecular biology.

[20]  B. Rannala,et al.  Molecular phylogenetics: principles and practice , 2012, Nature Reviews Genetics.

[21]  C. Orengo,et al.  Protein function prediction--the power of multiplicity. , 2009, Trends in biotechnology.

[22]  Thomas E. Ferrin,et al.  Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies , 2009, PloS one.

[23]  Mary Jo Ondrechen,et al.  Protein function annotation with Structurally Aligned Local Sites of Activity (SALSAs) , 2013, BMC Bioinformatics.

[24]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[25]  Walter R. Gilks,et al.  CORRIE: enzyme sequence annotation with confidence estimates , 2007, BMC Bioinformatics.

[26]  Heidi J Imker,et al.  Discovery of a dipeptide epimerase enzymatic function guided by homology modeling and virtual screening. , 2008, Structure.

[27]  P. Babbitt,et al.  Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies. , 2001, Annual review of biochemistry.

[28]  Patricia C Babbitt,et al.  Divergence of function in the thioredoxin fold suprafamily: evidence for evolution of peroxiredoxins from a thioredoxin-like ancestor. , 2004, Biochemistry.

[29]  Stephen K. Burley,et al.  Target selection and annotation for the structural genomics of the amidohydrolase and enolase superfamilies , 2009, Journal of Structural and Functional Genomics.

[30]  David A. Lee,et al.  GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains , 2009, Nucleic acids research.

[31]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[32]  Andrei N. Lupas,et al.  CLANS: a Java application for visualizing protein families based on pairwise similarity , 2004, Bioinform..

[33]  E. W. Moomaw,et al.  Protein Similarity Networks Reveal Relationships among Sequence, Structure, and Function within the Cupin Superfamily , 2013, PloS one.

[34]  H M Holden,et al.  The crotonase superfamily: divergently related enzymes that catalyze different reactions involving acyl coenzyme a thioesters. , 2001, Accounts of chemical research.

[35]  Laura Soito,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS Analysis of the peroxiredoxin family: Using , 2022 .

[36]  Franck Picard,et al.  High-quality sequence clustering guided by network topology and multiple alignment likelihood , 2012, Bioinform..

[37]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[38]  Darren P Martin,et al.  Phylogenetic reconstruction methods: an overview. , 2014, Methods in molecular biology.

[39]  Karin M. Verspoor,et al.  A categorization approach to automated ontological function annotation , 2006, Protein science : a publication of the Protein Society.

[40]  Shoshana D. Brown,et al.  Homology models guide discovery of diverse enzyme specificities among dipeptide epimerases in the enolase superfamily , 2012, Proceedings of the National Academy of Sciences.

[41]  S. Böcker,et al.  Comprehensive cluster analysis with Transitivity Clustering , 2011, Nature Protocols.

[42]  R. Armstrong,et al.  Structure, catalytic mechanism, and evolution of the glutathione transferases. , 1997, Chemical research in toxicology.

[43]  Ozlem Keskin,et al.  Integrating Structure to Protein-Protein Interaction Networks That Drive Metastasis to Brain and Lung in Breast Cancer , 2013, PloS one.

[44]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[45]  P. Babbitt,et al.  Evolution of enzymatic activities in the enolase superfamily: D-Mannonate dehydratase from Novosphingobium aromaticivorans. , 2007, Biochemistry.

[46]  J. Skolnick,et al.  Structure‐based functional motif identifies a potential disulfide oxidoreductase active site in the serine/threonine protein phosphatase‐1 subfamily , 1999, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[47]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[48]  Ersin Bayram,et al.  Chemical and Structural Diversity in Cyclooxygenase Protein Active Sites , 2005, Chemistry & biodiversity.

[49]  Patricia C. Babbitt,et al.  Glutathione Transferases Are Structural and Functional Outliers in the Thioredoxin Fold† , 2009, Biochemistry.

[50]  A. Valencia Automatic annotation of protein function. , 2005, Current opinion in structural biology.

[51]  J. Fetrow,et al.  Structural and electrostatic asymmetry at the active site in typical and atypical peroxiredoxin dimers. , 2012, The journal of physical chemistry. B.

[52]  Jacquelyn S Fetrow,et al.  Structure-based active site profiles for genome analysis and functional family subclassification. , 2003, Journal of molecular biology.

[53]  Michael A. Hicks,et al.  The Structure–Function Linkage Database , 2013, Nucleic Acids Res..

[54]  Silvio C. E. Tosatto,et al.  PANADA: Protein Association Network Annotation, Determination and Analysis , 2013, PloS one.

[55]  Shoshana D. Brown,et al.  A gold standard set of mechanistically diverse enzyme superfamilies , 2006, Genome Biology.

[56]  John H. Morris,et al.  Improving the quality of protein similarity network clustering algorithms using the network edge weight distribution , 2011, Bioinform..

[57]  C. Chothia,et al.  Evolution of the Protein Repertoire , 2003, Science.

[58]  Vincent Miele,et al.  Ultra-fast sequence clustering from similarity networks with SiLiX , 2011, BMC Bioinformatics.

[59]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[60]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..