Interactome-wide prediction of short, disordered protein interaction motifs in humans.

Many of the specific functions of intrinsically disordered protein segments are mediated by Short Linear Motifs (SLiMs) interacting with other proteins. Well known examples include SLiMs that interact with 14-3-3, PDZ, SH2, SH3, and WW domains but the true extent and diversity of SLiM-mediated interactions is largely unknown. Here, we attempt to expand our knowledge of human SLiMs by applying in silico SLiM prediction to the human interactome. Combining data from seven different interaction databases, we analysed approximately 6000 protein-centred and 1600 domain-centred human interaction datasets of 3+ unrelated proteins that interact with a common partner. Results were placed in context through comparison to randomised datasets of similar size and composition. The search returned thousands of evolutionarily conserved, intrinsically disordered occurrences of hundreds of significantly enriched recurring motifs, including many that have never been previously identified (). In addition to True Positive results for at least 25 different known SLiMs, a striking number of "off-target" proteins/domains also returned significantly enriched known motifs. Often, this was due to the non-independence of the datasets, with many proteins sharing interaction partners or contributing interactions to multiple domain datasets. The majority of these motif classes, however, were also found to be significantly enriched in one or more randomised datasets. This highlights the need for care when interpreting motif predictions of this nature but also raises the possibility that SLiM occurrences may be successfully identified independently of interaction data. Although not as compositionally biased as previous studies, patterns matching known SLiMs tended to cluster into a few large groups of similar sequence, while novel predictions tended to be more distinctive and less abundant. Whether this is due to ascertainment bias or a true functional composition bias of SLiMs is not clear and warrants further investigation.

[1]  Richard J. Edwards,et al.  SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in Proteins , 2007, PloS one.

[2]  István Simon,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm035 Structural bioinformatics Local structural disorder imparts plasticity on linear motifs , 2022 .

[3]  Zoran Obradovic,et al.  The protein trinity—linking function and disorder , 2001, Nature Biotechnology.

[4]  Sarah Boyd,et al.  PMAP: databases for analyzing proteolytic events and pathways , 2008, Nucleic Acids Res..

[5]  Jennifer McDowall,et al.  InterPro protein classification. , 2011, Methods in molecular biology.

[6]  Rodrigo Lopez,et al.  A tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences , 2008, BMC Bioinformatics.

[7]  Richard J. Edwards,et al.  The SLiMDisc server: short, linear motif discovery in proteins , 2007, Nucleic Acids Res..

[8]  Heinrich Sticht,et al.  A computational strategy for the prediction of functional linear peptide motifs in proteins , 2007, Bioinform..

[9]  Toby J. Gibson,et al.  Discovery of candidate KEN-box motifs using Cell Cycle keyword enrichment combined with native disorder prediction and motif conservation , 2008, Bioinform..

[10]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[11]  Allegra Via,et al.  Phospho.ELM: a database of phosphorylation sites—update 2008 , 2007, Nucleic Acids Res..

[12]  Peer Bork,et al.  SMART 6: recent updates and new developments , 2008, Nucleic Acids Res..

[13]  Jakub Pas,et al.  ELM: the status of the 2010 eukaryotic linear motif resource , 2009, Nucleic Acids Res..

[14]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[15]  Leszek Rychlewski,et al.  ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins , 2003, Nucleic Acids Res..

[16]  Maria Victoria Schneider,et al.  MINT: a Molecular INTeraction database. , 2002, FEBS letters.

[17]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[18]  Richard J. Edwards,et al.  Estimation and efficient computation of the true probability of recurrence of short linear protein sequence motifs in unrelated proteins , 2010, BMC Bioinformatics.

[19]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[20]  Richard J. Edwards,et al.  CompariMotif: quick and easy comparisons of sequence motifs , 2008, Bioinform..

[21]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[22]  Zsuzsanna Dosztányi,et al.  ANCHOR: web server for predicting protein binding regions in disordered proteins , 2009, Bioinform..

[23]  Sanguthevar Rajasekaran,et al.  Minimotif miner 2nd release: a database and web system for motif search , 2008, Nucleic Acids Res..

[24]  A. Dunker,et al.  Retro-MoRFs: Identifying Protein Binding Sites by Normal and Reverse Alignment and Intrinsic Disorder Prediction , 2010, International journal of molecular sciences.

[25]  Neil D. Rawlings,et al.  MEROPS: the peptidase database , 2009, Nucleic Acids Res..

[26]  Victor Neduva,et al.  Peptides mediating interaction networks: new leads at last. , 2006, Current opinion in biotechnology.

[27]  Sue Povey,et al.  The HGNC Database in 2008: a resource for the human genome , 2007, Nucleic Acids Res..

[28]  Toby J Gibson,et al.  Cell regulation: determined to signal discrete cooperation. , 2009, Trends in biochemical sciences.

[29]  Alex Bateman,et al.  Reuse of structural domain–domain interactions in protein networks , 2007, BMC Bioinformatics.

[30]  BMC Bioinformatics , 2005 .

[31]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[32]  Andrew Chatr-aryamontri,et al.  DOMINO: a database of domain–peptide interactions , 2006, Nucleic Acids Res..

[33]  T. Gibson,et al.  Systematic Discovery of New Recognition Peptides Mediating Protein Interaction Networks , 2005, PLoS biology.

[34]  Marc S. Cortese,et al.  Analysis of molecular recognition features (MoRFs). , 2006, Journal of molecular biology.

[35]  Richard J. Edwards,et al.  Masking residues using context-specific evolutionary conservation significantly improves short linear motif discovery , 2009, Bioinform..

[36]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[37]  Lincoln Stein,et al.  Reactome knowledgebase of human biological pathways and processes , 2008, Nucleic Acids Res..

[38]  Marc A. Martí-Renom,et al.  Characterization of Protein Hubs by Inferring Interacting Motifs from Protein Interactions , 2007, PLoS Comput. Biol..

[39]  A Keith Dunker,et al.  Characterization of molecular recognition features, MoRFs, and their binding partners. , 2007, Journal of proteome research.

[40]  Kazutaka Katoh,et al.  Recent developments in the MAFFT multiple sequence alignment program , 2008, Briefings Bioinform..

[41]  Olivier Elemento,et al.  Large-Scale Discovery and Characterization of Protein Regulatory Motifs in Eukaryotes , 2010, PloS one.

[42]  Norman E. Davey,et al.  How viruses hijack cell regulation. , 2011, Trends in biochemical sciences.

[43]  Robert B. Russell,et al.  DILIMOT: discovery of linear motifs in proteins , 2006, Nucleic Acids Res..

[44]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[45]  Richard J. Edwards,et al.  SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent , 2006, Nucleic acids research.

[46]  Richard J. Edwards,et al.  SLiMSearch 2.0: biological context for short linear motifs in proteins , 2011, Nucleic Acids Res..

[47]  R. Russell,et al.  Linear motifs: Evolutionary interaction switches , 2005, FEBS letters.