dagLogo: An R/Bioconductor package for identifying and visualizing differential amino acid group usage in proteomics data

Sequence logos have been widely used as graphical representations of conserved nucleic acid and protein motifs. Due to the complexity of the amino acid (AA) alphabet, rich post-translational modification, and diverse subcellular localization of proteins, few versatile tools are available for effective identification and visualization of protein motifs. In addition, various reduced AA alphabets based on physicochemical, structural, or functional properties have been valuable in the study of protein alignment, folding, structure prediction, and evolution. However, there is lack of tools for applying reduced AA alphabets to the identification and visualization of statistically significant motifs. To fill this gap, we developed an R/Bioconductor package dagLogo, which has several advantages over existing tools. First, dagLogo allows various formats for input sets and provides comprehensive options to build optimal background models. It implements different reduced AA alphabets to group AAs of similar properties. Furthermore, dagLogo provides statistical and visual solutions for differential AA (or AA group) usage analysis of both large and small data sets. Case studies showed that dagLogo can better identify and visualize conserved protein sequence patterns from different types of inputs and can potentially reveal the biological patterns that could be missed by other logo generators.

[1]  C C Bigelow,et al.  On the average hydrophobicity of proteins and the relation between it and protein structure. , 1967, Journal of theoretical biology.

[2]  Rob Phillips,et al.  Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment , 2009, Bioinform..

[3]  Robert Fletterick,et al.  The structure of the pro-apoptotic protease granzyme B reveals the molecular determinants of its specificity , 2000, Nature Structural Biology.

[4]  J. Hardouin,et al.  Characterization of N-terminal protein modifications in Pseudomonas aeruginosa PA14. , 2015, Journal of proteomics.

[5]  Kushal K. Dey A Brief History of Sequence Logos , 2018 .

[6]  Yoshiharu Matsuura,et al.  Kinase-interacting substrate screening is a novel method to identify kinase substrates , 2015, The Journal of cell biology.

[7]  A. Gingras,et al.  Systematic investigation of hierarchical phosphorylation by protein kinase CK2. , 2015, Journal of proteomics.

[8]  N H Sigal,et al.  Human cytotoxic lymphocyte granzyme B. Its purification from granules and the characterization of substrate and inhibitor specificity. , 1991, The Journal of biological chemistry.

[9]  Jennifer L. Harris,et al.  Definition and Redesign of the Extended Substrate Specificity of Granzyme B* , 1998, The Journal of Biological Chemistry.

[10]  M. Martí-Renom,et al.  Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets , 2006, Proteins.

[11]  Morten Nielsen,et al.  Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion , 2012, Nucleic Acids Res..

[12]  F. Sherman,et al.  N-terminal acetyltransferases and sequence requirements for N-terminal acetylation of eukaryotic proteins. , 2003, Journal of molecular biology.

[13]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[14]  Wei Wang,et al.  Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids , 2007, Science in China Series C: Life Sciences.

[15]  Guozhi Zhu,et al.  Protein Kinase Specificity: A Strategic Collaboration between Kinase Peptide Specificity and Substrate Recruitment , 2005, Cell cycle.

[16]  Chris M. Brown,et al.  Compensation for nucleotide bias in a genome by representation as a discrete channel with noise , 2002, Bioinform..

[17]  Matthew Stephens,et al.  A new sequence logo plot to highlight enrichment and depletion , 2017, BMC Bioinformatics.

[18]  Paul R Thompson,et al.  Kinase consensus sequences: a breeding ground for crosstalk. , 2011, ACS chemical biology.

[19]  Tao Ma,et al.  CircularLogo: A lightweight web application to visualize intra-motif dependencies , 2017, BMC Bioinformatics.

[20]  J. Biro,et al.  Theoretical Biology and Medical Modelling , 2005 .

[21]  Alexandre V. Morozov,et al.  Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE , 2006, ISMB.

[22]  J. Trapani Granzymes: a family of lymphocyte granule serine proteases , 2001, Genome Biology.

[23]  T. Pawson,et al.  Protein phosphorylation in signaling--50 years and counting. , 2005, Trends in biochemical sciences.

[24]  Jorng-Tzong Horng,et al.  RNALogo: a new approach to display structural RNA alignment , 2008, Nucleic Acids Res..

[25]  Alfonso Valencia,et al.  Automated Alphabet Reduction for Protein Datasets , 2009, BMC Bioinformatics.

[26]  M. Wirtz,et al.  N-terminal acetylation: an essential protein modification emerges as an important regulator of stress responses. , 2018, Journal of experimental botany.

[27]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[28]  Haixu Tang,et al.  RAPSearch: a fast protein similarity search tool for short reads , 2011, BMC Bioinformatics.

[29]  Gary D. Stormo,et al.  enoLOGOS: a versatile web tool for energy normalized sequence logos , 2005, Nucleic Acids Res..

[30]  H. Zou,et al.  Global Screening of CK2 Kinase Substrates by an Integrated Phosphoproteomics Workflow , 2013, Scientific Reports.

[31]  B. Kemp,et al.  Protein kinase recognition sequence motifs. , 1990, Trends in biochemical sciences.

[32]  Bruce Stillman,et al.  Deciphering Protein Kinase Specificity through Large-scale Analysis of Materials Supplemental Deciphering Protein Kinase Specificity through Large-scale Analysis of Yeast Phosphorylation Site Motifs , 2010 .

[33]  Nick Goldman,et al.  A new criterion and method for amino acid classification. , 2004, Journal of theoretical biology.

[34]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[35]  S. Mohammed,et al.  Universal quantitative kinase assay based on diagonal SCX chromatography and stable isotope dimethyl labeling provides high-definition kinase consensus motifs for PKA and human Mps1. , 2013, Journal of proteome research.

[36]  R. Huber,et al.  Crystal Structure of the Caspase Activator Human Granzyme B, a Proteinase Highly Specific for an Asp-P1 Residue , 2000, Biological chemistry.

[37]  Zhongming Zhao,et al.  Quantitative network mapping of the human kinome interactome reveals new clues for rational kinase inhibitor discovery and individualized cancer therapy , 2014, Oncotarget.

[38]  Xavier Llorà,et al.  Automated alphabet reduction method with evolutionary algorithms for protein structure prediction , 2007, GECCO '07.

[39]  Michael R. Green,et al.  MELK Promotes Melanoma Growth by Stimulating the NF-κB Pathway. , 2017, Cell reports.

[40]  Gary D. Stormo,et al.  Displaying the information contents of structural RNA alignments: the structure logos , 1997, Comput. Appl. Biosci..

[41]  K. Gevaert,et al.  Improved visualization of protein consensus sequences by iceLogo , 2009, Nature Methods.

[42]  F Sherman,et al.  Identification and specificities of N‐terminal acetyltransferases from Saccharomyces cerevisiae , 1999, The EMBO journal.

[43]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[44]  T. Arnesen,et al.  NatF Contributes to an Evolutionary Shift in Protein N-Terminal Acetylation and Is Important for Normal Chromosome Segregation , 2011, PLoS genetics.

[45]  L. Mirny,et al.  Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. , 1999, Journal of molecular biology.

[46]  A. Ranz,et al.  Current Protocols in Protein Science , 2013 .

[47]  Mark J. Schreiber,et al.  Recovering motifs from biased genomes: application of signal correction , 2006, Nucleic acids research.

[48]  Lennart Martens,et al.  The iceLogo web server and SOAP service for determining protein consensus sequences , 2015, Nucleic Acids Res..

[49]  S. Rusin,et al.  Identification of Candidate Casein Kinase 2 Substrates in Mitosis by Quantitative Phosphoproteomics , 2017, Front. Cell Dev. Biol..

[50]  Renzo Boldorini,et al.  Frequent alterations in the expression of serine/threonine kinases in human cancers. , 2006, Cancer research.

[51]  P. Cohen,et al.  The origins of protein phosphorylation , 2002, Nature Cell Biology.

[52]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[53]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[54]  Yu Chang,et al.  RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou’s five-step rule , 2019, Database J. Biol. Databases Curation.

[55]  Jun Wang,et al.  Reduction of protein sequence complexity by residue grouping. , 2003, Protein engineering.

[56]  G. Crippen,et al.  Contact potential that recognizes the correct folding of globular proteins. , 1992, Journal of molecular biology.

[57]  T. Arnesen,et al.  First Things First: Vital Protein Marks by N-Terminal Acetyltransferases. , 2016, Trends in biochemical sciences.

[58]  James E. Ferrell,et al.  Mechanisms of specificity in protein phosphorylation , 2007, Nature Reviews Molecular Cell Biology.

[59]  Lei Yang,et al.  RaacLogo: a new sequence logo generator by using reduced amino acid clusters , 2020, Briefings Bioinform..

[60]  R. Fletterick,et al.  Characterization of Structural Determinants of Granzyme B Reveals Potent Mediators of Extended Substrate Specificity* , 2004, Journal of Biological Chemistry.

[61]  M. Mann,et al.  Global, In Vivo, and Site-Specific Phosphorylation Dynamics in Signaling Networks , 2006, Cell.

[62]  Xuebing Wu,et al.  kpLogo: positional k-mer analysis reveals hidden specificity in biological sequences , 2017, bioRxiv.

[63]  R. Russell,et al.  Amino‐Acid Properties and Consequences of Substitutions , 2003 .

[64]  L. Pinna,et al.  Substrate specificity of protein kinase CK2. , 1994, Cellular & molecular biology research.

[65]  Xiao Hu,et al.  SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier , 2019, bioRxiv.

[66]  M. Glickman,et al.  Signature activities of 20S proteasome include degradation of the ubiquitin-tag with the protein under hypoxia , 2019, bioRxiv.

[67]  Zhou Songyang,et al.  Determination of the Specific Substrate Sequence Motifs of Protein Kinase C Isozymes* , 1997, The Journal of Biological Chemistry.

[68]  H. Chan Folding alphabets , 1999, Nature Structural Biology.

[69]  Robert D. Finn,et al.  Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models , 2014, BMC Bioinformatics.

[70]  S. Maurer-Stroh,et al.  Analysis of Protein Processing by N-terminal Proteomics Reveals Novel Species-specific Substrate Determinants of Granzyme B Orthologs *S , 2009, Molecular & Cellular Proteomics.

[71]  Stephen J. Freeland,et al.  Unearthing the Root of Amino Acid Similarity , 2013, Journal of Molecular Evolution.

[72]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[73]  R. Evjenth,et al.  Proteomics analyses reveal the evolutionary conservation and divergence of N-terminal acetyltransferases from yeast and humans , 2009, Proceedings of the National Academy of Sciences.

[74]  J. Tschopp,et al.  A family of serine esterases in lytic granules of cytolytic T lymphocytes , 1987, Cell.

[75]  J. Ferrell,et al.  Mechanisms of specificity in protein phosphorylation , 2007, Nature Reviews Molecular Cell Biology.

[76]  J. M. Zimmerman,et al.  The characterization of amino acid sequences in proteins by statistical methods. , 1968, Journal of theoretical biology.

[77]  F. Gnad,et al.  Systems-wide Analysis of K-Ras, Cdc42, and PAK4 Signaling by Quantitative Phosphoproteomics* , 2013, Molecular & Cellular Proteomics.

[78]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[79]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[80]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[81]  P. Wingfield N‐Terminal Methionine Processing , 2017, Current protocols in protein science.

[82]  Jun Wang,et al.  A computational approach to simplifying the protein folding alphabet , 1999, Nature Structural Biology.

[83]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[84]  T. Hunter,et al.  Protein kinases and phosphatases: The Yin and Yang of protein phosphorylation and signaling , 1995, Cell.

[85]  Zhiping Weng,et al.  LogoJS: a Javascript package for creating sequence logos and embedding them in web applications , 2020, Bioinform..

[86]  S. Mathivanan,et al.  A curated compendium of phosphorylation motifs , 2007, Nature Biotechnology.

[87]  Bo Yang,et al.  BLogo: a tool for visualization of bias in biological sequences , 2008, Bioinform..

[88]  A. Scholten,et al.  Interrogating cAMP-dependent Kinase Signaling in Jurkat T Cells via a Protein Kinase A Targeted Immune-precipitation Phosphoproteomics Approach* , 2013, Molecular & Cellular Proteomics.

[89]  George M Church,et al.  pLogo: a probabilistic approach to visualizing sequence motifs , 2013, Nature Methods.

[90]  Xin Li,et al.  Reduced alphabet for protein folding prediction , 2015, Proteins.

[91]  Yue Zhao,et al.  PTM-Logo: a program for generation of sequence logos based on position-specific background amino-acid probabilities , 2019, Bioinform..

[92]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[93]  Giuseppe Troiano,et al.  The crucial role of protein phosphorylation in cell signaling and its use as targeted therapy (Review) , 2017, International journal of molecular medicine.

[94]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[95]  J. Soppa Protein Acetylation in Archaea, Bacteria, and Eukaryotes , 2010, Archaea.