Proteome-Wide Discovery of Evolutionary Conserved Sequences in Disordered Regions

A statistical analysis method can identify short, functionally important linear motifs in disordered regions of proteins. Finding the Hidden Meaning in Disordered Regions Many proteins, including those involved in signal transduction, have large disordered regions, in addition to their clearly defined domains or motifs. Although these disordered regions are functionally important, identifying the important residues in these regions has proved challenging because the regions are not visualized in crystal structures and tend to exhibit high sequence divergence. Nguyen Ba et al. modified the phylogenetic hidden Markov model so that it could be applied to these disordered regions. Application of this method to yeast proteins not only revealed the presence of known short conserved motifs in proteins not known to have these motifs but also predicted previously unknown short conserved motifs. Experimental analysis suggested that both sets of motifs were functionally important. Thus, this approach should provide an effective method for discovering biologically important conserved motifs within the disordered regions of proteins. At least 30% of human proteins are thought to contain intrinsically disordered regions, which lack stable structural conformation. Despite lacking enzymatic functions and having few protein domains, disordered regions are functionally important for protein regulation and contain short linear motifs (short peptide sequences involved in protein-protein interactions), but in most disordered regions, the functional amino acid residues remain unknown. We searched for evolutionarily conserved sequences within disordered regions according to the hypothesis that conservation would indicate functional residues. Using a phylogenetic hidden Markov model (phylo-HMM), we made accurate, specific predictions of functional elements in disordered regions even when these elements are only two or three amino acids long. Among the conserved sequences that we identified were previously known and newly identified short linear motifs, and we experimentally verified key examples, including a motif that may mediate interaction between protein kinase Cbk1 and its substrates. We also observed that hub proteins, which interact with many partners in a protein interaction network, are highly enriched in these conserved sequences. Our analysis enabled the systematic identification of the functional residues in disordered regions and suggested that at least 5% of amino acids in disordered regions are important for function.

[1]  Eric L. Weiss,et al.  Cbk1 Regulation of the RNA-Binding Protein Ssd1 Integrates Cell Fate with Translational Control , 2009, Current Biology.

[2]  L. Iakoucheva,et al.  The importance of intrinsic disorder for protein phosphorylation. , 2004, Nucleic acids research.

[3]  T. Giddings,et al.  Saccharomyces cerevisiae Ndc1p Is a Shared Component of Nuclear Pore Complexes and Spindle Pole Bodies , 1998, The Journal of cell biology.

[4]  Michael D. Abràmoff,et al.  Image processing with ImageJ , 2004 .

[5]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[6]  Alan M. Moses,et al.  Evolution of characterized phosphorylation sites in budding yeast. , 2010, Molecular biology and evolution.

[7]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[8]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[9]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[10]  Kara Dolinski,et al.  The Princeton Protein Orthology Database (P-POD): A Comparative Genomics Analysis Tool for Biologists , 2007, PloS one.

[11]  Marc Vidal,et al.  Confirmation of Organized Modularity in the Yeast Interactome , 2007, PLoS biology.

[12]  Vladislav Yu Orekhov,et al.  Binding of intrinsically disordered proteins is not necessarily accompanied by a structural transition to a folded form. , 2007, Biochimie.

[13]  N. Friedman,et al.  Natural history and evolutionary principles of gene duplication in fungi , 2007, Nature.

[14]  Tony Pawson,et al.  Multisite phosphorylation of a CDK inhibitor sets a threshold for the onset of DNA replication , 2001, Nature.

[15]  F. Winston,et al.  Spt10 and Spt21 Are Required for Transcriptional Silencing in Saccharomyces cerevisiae , 2010, Eukaryotic Cell.

[16]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[17]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[18]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[19]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[20]  T. Giddings,et al.  A Novel Allele of Saccharomyces cerevisiae NDC1 Reveals a Potential Role for the Spindle Pole Body Component Ndc1p in Nuclear Pore Assembly , 2004, Eukaryotic Cell.

[21]  Robert B. Russell,et al.  DILIMOT: discovery of linear motifs in proteins , 2006, Nucleic Acids Res..

[22]  E. Birney,et al.  Comparative genomics: genome-wide analysis in metazoan eukaryotes , 2003, Nature Reviews Genetics.

[23]  K. Shokat,et al.  Targets of the cyclin-dependent kinase Cdk1 , 2003, Nature.

[24]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[25]  Alan M. Moses,et al.  Moving from transcriptional to phospho-evolution: generalizing regulatory evolution? , 2010, Trends in genetics : TIG.

[26]  Gary D. Bader,et al.  Bayesian Modeling of the Yeast SH3 Domain Interactome Predicts Spatiotemporal Dynamics of Endocytosis Proteins , 2009, PLoS biology.

[27]  Richard J. Edwards,et al.  SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent , 2006, Nucleic acids research.

[28]  J. Peters,et al.  How APC/C orders destruction , 2006, Nature Cell Biology.

[29]  W R Taylor,et al.  A model recognition approach to the prediction of all-helical membrane protein structure and topology. , 1994, Biochemistry.

[30]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[31]  A. D. Robertson,et al.  A functional R domain from cystic fibrosis transmembrane conductance regulator is predominantly unstructured in solution. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Jakub Pas,et al.  ELM: the status of the 2010 eukaryotic linear motif resource , 2009, Nucleic Acids Res..

[33]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[34]  A. Rodal,et al.  Negative Regulation of Yeast WASp by Two SH3 Domain-Containing Proteins , 2003, Current Biology.

[35]  Marius Sudol,et al.  WW and SH3 domains, two different scaffolds to recognize proline‐rich ligands , 2002, FEBS letters.

[36]  Elena Rivas,et al.  Probabilistic Phylogenetic Inference with Insertions and Deletions , 2008, PLoS Comput. Biol..

[37]  Zsuzsanna Dosztányi,et al.  Prediction of Protein Binding Regions in Disordered Proteins , 2009, PLoS Comput. Biol..

[38]  T. Gibson,et al.  Systematic Discovery of New Recognition Peptides Mediating Protein Interaction Networks , 2005, PLoS biology.

[39]  T. Hughes,et al.  Mapping pathways and phenotypes by systematic gene overexpression. , 2006, Molecular cell.

[40]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.

[41]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[42]  P. Bork,et al.  Systematic Discovery of In Vivo Phosphorylation Networks , 2007, Cell.

[43]  Eric L. Weiss,et al.  The NDR/LATS Family Kinase Cbk1 Directly Controls Transcriptional Asymmetry , 2008, PLoS biology.

[44]  Michail Yu. Lobanov,et al.  Intrinsic Disorder in Protein Interactions: Insights From a Comprehensive Structural Analysis , 2009, PLoS Comput. Biol..

[45]  Michael B. Yaffe,et al.  Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs , 2003, Nucleic Acids Res..

[46]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[47]  E. O’Shea,et al.  Global analysis of protein expression in yeast , 2003, Nature.

[48]  Zhaohui S. Qin,et al.  A Global Protein Kinase and Phosphatase Interaction Network in Yeast , 2010, Science.

[49]  J. Peters,et al.  APC-Mediated Proteolysis of Ase1 and the Morphogenesis of the Mitotic Spindle , 1997, Science.

[50]  V. Uversky,et al.  Disorder in the nuclear pore complex: The FG repeat regions of nucleoporins are natively unfolded , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[51]  S. Jaspersen,et al.  Cdc14 activates Cdc15 to promote mitotic exit in budding yeast , 2000, Current Biology.

[52]  M. Goodman,et al.  Embryonic ε and γ globin genes of a prosimian primate (Galago crassicaudatus): Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints , 1988 .

[53]  Mark D. Robinson,et al.  FunSpec: a web-based cluster interpreter for yeast , 2002, BMC Bioinformatics.

[54]  M. Brandeis,et al.  Human Kid is Degraded by the APC/CCdh1 but Not by the APC/CCdc20 , 2007, Cell cycle.

[55]  P. D. Andrews,et al.  Sla1p couples the yeast endocytic machinery to proteins regulating actin dynamics. , 2002, Journal of cell science.

[56]  Pedro Beltrão,et al.  Comparative Genomics and Disorder Prediction Identify Biologically Relevant SH3 Protein Interactions , 2005, PLoS Comput. Biol..

[57]  Kevin P. Byrne,et al.  Additions, Losses, and Rearrangements on the Evolutionary Route from a Reconstructed Ancestor to the Modern Saccharomyces cerevisiae Genome , 2009, PLoS genetics.

[58]  Eunyoung Choi,et al.  Unique D Box and KEN Box Sequences Limit Ubiquitination of Acm1 and Promote Pseudosubstrate Inhibition of the Anaphase-promoting Complex* , 2008, Journal of Biological Chemistry.

[59]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[60]  Mark Gerstein,et al.  Biochemical and genetic analysis of the yeast proteome with a movable ORF collection. , 2005, Genes & development.

[61]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[62]  Richard J. Edwards,et al.  SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in Proteins , 2007, PloS one.

[63]  Rodrigo Lopez,et al.  A tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences , 2008, BMC Bioinformatics.

[64]  S. L. Holloway,et al.  The KEN box regulates Clb2 proteolysis in G1 and at the metaphase-to-anaphase transition , 2001, Current Biology.

[65]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[66]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[67]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[68]  K Kornfeld,et al.  Multiple docking sites on substrate proteins form a modular system that mediates recognition by ERK MAP kinase. , 1999, Genes & development.

[69]  Ryan E. Mills,et al.  Classical Nuclear Localization Signals: Definition, Function, and Interaction with Importin α* , 2007, Journal of Biological Chemistry.

[70]  Lilia M. Iakoucheva,et al.  Intrinsic Disorder Is a Common Feature of Hub Proteins from Four Eukaryotic Interactomes , 2006, PLoS Comput. Biol..

[71]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[72]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[73]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[74]  L. Fulton,et al.  Finding Functional Features in Saccharomyces Genomes by Phylogenetic Footprinting , 2003, Science.

[75]  Alan M. Moses,et al.  NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction , 2009, BMC Bioinformatics.

[76]  Christopher J. Oldfield,et al.  Intrinsically disordered proteins in human diseases: introducing the D2 concept. , 2008, Annual review of biophysics.

[77]  J. Bonifacino,et al.  Signals for sorting of transmembrane proteins to endosomes and lysosomes. , 2003, Annual review of biochemistry.

[78]  M. Gerstein,et al.  Global Analysis of Protein Activities Using Proteome Chips , 2001, Science.

[79]  M. Rose,et al.  Antagonistic regulation of Fus2p nuclear localization by pheromone signaling and the cell cycle , 2009, The Journal of cell biology.

[80]  K. Hardwick,et al.  Mad3 KEN Boxes Mediate both Cdc20 and Mad3 Turnover, and Are Critical for the Spindle Checkpoint , 2007, PloS one.

[81]  C. Burd,et al.  Acidic Di-leucine Motif Essential for AP-3–dependent Sorting and Restriction of the Functional Specificity of the Vam3p Vacuolar t-SNARE , 1998, The Journal of cell biology.

[82]  S. Baserga,et al.  The DEAD-box RNA helicase-like Utp25 is an SSU processome component. , 2010, RNA.

[83]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[84]  J. de la Cruz,et al.  Dbp6p Is an Essential Putative ATP-Dependent RNA Helicase Required for 60S-Ribosomal-Subunit Assembly inSaccharomyces cerevisiae , 1998, Molecular and Cellular Biology.

[85]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..

[86]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[87]  V. Uversky Intrinsically Disordered Proteins , 2014 .

[88]  Mike Tyers,et al.  Dynamic equilibrium engagement of a polyvalent ligand with a single-site receptor , 2008, Proceedings of the National Academy of Sciences.

[89]  A. Fersht,et al.  Structure of tumor suppressor p53 and its intrinsically disordered N-terminal transactivation domain , 2008, Proceedings of the National Academy of Sciences.

[90]  T. Gibson,et al.  Protein disorder prediction: implications for structural proteomics. , 2003, Structure.

[91]  Bruce Stillman,et al.  Deciphering Protein Kinase Specificity through Large-scale Analysis of Materials Supplemental Deciphering Protein Kinase Specificity through Large-scale Analysis of Yeast Phosphorylation Site Motifs , 2010 .

[92]  I. Pedruzzi,et al.  Regulation of G0 entry by the Pho80–Pho85 cyclin–CDK complex , 2005, The EMBO journal.

[93]  M. Kirschner,et al.  Anaphase initiation in Saccharomyces cerevisiae is controlled by the APC-dependent degradation of the anaphase inhibitor Pds1p. , 1996, Genes & development.

[94]  J. Yates,et al.  Global Analysis of Protein Palmitoylation in Yeast , 2006, Cell.

[95]  Richard J. Edwards,et al.  Computational identification and analysis of protein short linear motifs. , 2010, Frontiers in bioscience.

[96]  Reed A. Cartwright,et al.  Logarithmic gap costs decrease alignment accuracy , 2006, BMC Bioinformatics.

[97]  Olivier Elemento,et al.  Large-Scale Discovery and Characterization of Protein Regulatory Motifs in Eukaryotes , 2010, PloS one.

[98]  A Keith Dunker,et al.  Intrinsic disorder and protein function. , 2002, Biochemistry.

[99]  L. Iakoucheva,et al.  Intrinsic Disorder and Protein Function , 2002 .

[100]  Lilia Alberghina,et al.  Order propensity of an intrinsically disordered protein, the cyclin‐dependent‐kinase inhibitor Sic1 , 2009, Proteins.

[101]  M. Overduin,et al.  Molecular mechanism of NPF recognition by EH domains , 2000, Nature Structural Biology.

[102]  Bernard F. Buxton,et al.  The DISOPRED server for the prediction of protein disorder , 2004, Bioinform..

[103]  W. Lim,et al.  Docking interactions in protein kinase and phosphatase networks. , 2006, Current opinion in structural biology.

[104]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[105]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[106]  A Keith Dunker,et al.  Short Linear Motifs recognized by SH2, SH3 and Ser/Thr Kinase domains are conserved in disordered protein regions , 2008, BMC Genomics.

[107]  P. Bork,et al.  Co-evolution of transcriptional and post-translational cell-cycle regulation , 2006, Nature.