Large-Scale Discovery and Characterization of Protein Regulatory Motifs in Eukaryotes

The increasing ability to generate large-scale, quantitative proteomic data has brought with it the challenge of analyzing such data to discover the sequence elements that underlie systems-level protein behavior. Here we show that short, linear protein motifs can be efficiently recovered from proteome-scale datasets such as sub-cellular localization, molecular function, half-life, and protein abundance data using an information theoretic approach. Using this approach, we have identified many known protein motifs, such as phosphorylation sites and localization signals, and discovered a large number of candidate elements. We estimate that ∼80% of these are novel predictions in that they do not match a known motif in both sequence and biological context, suggesting that post-translational regulation of protein behavior is still largely unexplored. These predicted motifs, many of which display preferential association with specific biological pathways and non-random positioning in the linear protein sequence, provide focused hypotheses for experimental validation.

[1]  Richard J. Edwards,et al.  SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in Proteins , 2007, PloS one.

[2]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[3]  J. Shabanowitz,et al.  Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae , 2002, Nature Biotechnology.

[4]  István Simon,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm035 Structural bioinformatics Local structural disorder imparts plasticity on linear motifs , 2022 .

[5]  D. Chelsky,et al.  Sequence requirements for synthetic peptide-mediated translocation to the nucleus , 1989, Molecular and cellular biology.

[6]  W. Richardson,et al.  Sequence requirements for nuclear location of simian virus 40 large-T antigen , 1984, Nature.

[7]  J. Rush,et al.  Immunoaffinity profiling of tyrosine phosphorylation in cancer cells , 2005, Nature Biotechnology.

[8]  E. O’Shea,et al.  Quantification of protein half-lives in the budding yeast proteome , 2006, Proceedings of the National Academy of Sciences.

[9]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[10]  Heinrich Sticht,et al.  A computational strategy for the prediction of functional linear peptide motifs in proteins , 2007, Bioinform..

[11]  R. Aebersold,et al.  Applying mass spectrometry-based proteomics to genetics, genomics and network biology , 2009, Nature Reviews Genetics.

[12]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[13]  S. Subramani Targeting of proteins into the peroxisomal matrix , 2004, The Journal of Membrane Biology.

[14]  Leszek Rychlewski,et al.  ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins , 2003, Nucleic Acids Res..

[15]  W. Bialek,et al.  Information-based clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  M. Gerstein,et al.  Global analysis of protein phosphorylation in yeast , 2005, Nature.

[17]  N. Slonim,et al.  A universal framework for regulatory element discovery across all genomes and data types. , 2007, Molecular cell.

[18]  A. A. Reilly,et al.  An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences , 1990, Proteins.

[19]  B. Martoglio,et al.  Signal sequences: more than just greasy peptides. , 1998, Trends in cell biology.

[20]  Charles Elkan,et al.  The Value of Prior Knowledge in Discovering Motifs with MEME , 1995, ISMB.

[21]  I. Braakman,et al.  The CXXCXXC motif determines the folding, structure and stability of human Ero1‐Lα , 2000, The EMBO journal.

[22]  Alastair Aitken Identification of protein consensus sequences : active site motifs, phosphorylation, and other post-translational modifications , 1990 .

[23]  R. Russell,et al.  Linear motifs: Evolutionary interaction switches , 2005, FEBS letters.

[24]  Richard J. Edwards,et al.  SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent , 2006, Nucleic acids research.

[25]  O. Elemento,et al.  Revealing global regulatory perturbations across human cancers. , 2009, Molecular cell.

[26]  Niall J. Haslam,et al.  Understanding eukaryotic linear motifs and their role in cell signaling and regulation. , 2008, Frontiers in bioscience : a journal and virtual library.

[27]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[28]  W. Lim,et al.  Docking interactions in protein kinase and phosphatase networks. , 2006, Current opinion in structural biology.

[29]  Victor Neduva,et al.  Peptides mediating interaction networks: new leads at last. , 2006, Current opinion in biotechnology.

[30]  T. Gibson,et al.  A careful disorderliness in the proteome: Sites for interaction and targets for future therapies , 2008, FEBS letters.

[31]  J. H. Shinn,et al.  Minimotif Miner: a tool for investigating protein function , 2006, Nature Methods.

[32]  Richard J. Edwards,et al.  CompariMotif: quick and easy comparisons of sequence motifs , 2008, Bioinform..

[33]  S. Gygi,et al.  An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets , 2005, Nature Biotechnology.

[34]  T. Gibson,et al.  Systematic Discovery of New Recognition Peptides Mediating Protein Interaction Networks , 2005, PLoS biology.

[35]  Jan C. Semenza,et al.  ERD2, a yeast gene required for the receptor-mediated retrieval of luminal ER proteins from the secretory pathway , 1990, Cell.

[36]  N. Pfanner,et al.  Global Analysis of the Mitochondrial N-Proteome Identifies a Processing Peptidase Critical for Protein Stability , 2009, Cell.

[37]  G von Heijne,et al.  Cleavage-site motifs in mitochondrial targeting peptides. , 1990, Protein engineering.

[38]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[39]  Toby J. Gibson,et al.  KEPE—a motif frequently superimposed on sumoylation sites in metazoan chromatin proteins and transcription factors , 2008, Bioinform..

[40]  B. Kemp,et al.  Substrate specificities for yeast and mammalian cAMP-dependent protein kinases are similar but not identical. , 1991, The Journal of biological chemistry.

[41]  Michael A. Beer,et al.  Predicting Gene Expression from Sequence , 2004, Cell.

[42]  Robert B. Russell,et al.  DILIMOT: discovery of linear motifs in proteins , 2006, Nucleic Acids Res..

[43]  D. Higgins,et al.  Finding flexible patterns in unaligned protein sequences , 1995, Protein science : a publication of the Protein Society.

[44]  I. Herskowitz,et al.  Targeting of E. coli β-galactosidase to the nucleus in yeast , 1984, Cell.

[45]  J. Hurley,et al.  Structure of Type IIβ Phosphatidylinositol Phosphate Kinase A Protein Kinase Fold Flattened for Interfacial Phosphorylation , 1998, Cell.

[46]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[47]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[48]  G. Blobel,et al.  Translocation of proteins across membranes: the signal hypothesis and beyond. , 1979, Symposia of the Society for Experimental Biology.

[49]  Q. Deveraux,et al.  Characterization of Two Polyubiquitin Binding Sites in the 26 S Protease Subunit 5a* , 1998, The Journal of Biological Chemistry.