D-SLIMMER: domain-SLiM interaction motifs miner for sequence based protein-protein interaction data.

Many biologically important protein-protein interactions (PPIs) have been found to be mediated by short linear motifs (SLiMs). These interactions are mediated by the binding of a protein domain, often with a nonlinear interaction interface, to a SLiM. We propose a method called D-SLIMMER to mine for SLiMs in PPI data on the basis of the interaction density between a nonlinear motif (i.e., a protein domain) in one protein and a SLiM in the other protein. Our results on a benchmark of 113 experimentally verified reference SLiMs showed that D-SLIMMER outperformed existing methods notably for discovering domain-SLiMs interaction motifs. To illustrate the significance of the SLiMs detected, we highlighted two SLiMs discovered from the PPI data by D-SLIMMER that are variants of the known ELM SLiM, as well as a literature-backed SLiM that is yet to be listed in the reference databases. We also presented a novel SLiM predicted by D-SLIMMER that was strongly supported by existing biological literatures. These examples showed that D-SLIMMER is able to find SLiMs that are biologically relevant.

[1]  Leszek Rychlewski,et al.  ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins , 2003, Nucleic Acids Res..

[2]  Cynthia Wolberger,et al.  The structural basis of sirtuin substrate affinity. , 2006, Biochemistry.

[3]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[4]  Ronen Marmorstein,et al.  Structure of the retinoblastoma protein bound to adenovirus E1A reveals the molecular basis for viral oncoprotein inactivation of a tumor suppressor. , 2007, Genes & development.

[5]  Anne Houdusse,et al.  Crystal structure of apo-calmodulin bound to the first two IQ motifs of myosin V reveals essential recognition features , 2006, Proceedings of the National Academy of Sciences.

[6]  Richard J. Edwards,et al.  SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent , 2006, Nucleic acids research.

[7]  E. Verdin,et al.  Sirtuins: Sir2-related NAD-dependent protein deacetylases , 2004, Genome Biology.

[8]  R. Russell,et al.  Linear motifs: Evolutionary interaction switches , 2005, FEBS letters.

[9]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[10]  Wing-Kin Sung,et al.  SLiM on Diet: finding short linear motifs on domain interaction interfaces in Protein Data Bank , 2010, Bioinform..

[11]  A. Rhoads,et al.  Sequence motifs for calmodulin recognition , 1997, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[12]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[13]  Patrick Aloy,et al.  Novel Peptide-Mediated Interactions Derived from High-Resolution 3-Dimensional Structures , 2010, PLoS Comput. Biol..

[14]  Siu-Ming Yiu,et al.  Clustering-Based Approach for Predicting Motif Pairs from protein Interaction Data , 2009, J. Bioinform. Comput. Biol..

[15]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[16]  Geoffrey J. Barton,et al.  TarO: a target optimisation system for structural biology , 2008, Nucleic Acids Res..

[17]  Michael Kofler,et al.  Conserved beta-hairpin recognition by the GYF domains of Smy2 and GIGYF2 in mRNA surveillance and vesicular transport complexes. , 2010, Structure.

[18]  V. Hruby,et al.  Peptidomimetics, a synthetic tool of drug discovery. , 2008, Current opinion in chemical biology.

[19]  Frank Neven,et al.  SLIDER: Mining Correlated Motifs in Protein-Protein Interaction Networks , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[20]  Robert B. Russell,et al.  An automated stochastic approach to the identification of the protein specificity determinants and functional subfamilies , 2010, Algorithms for Molecular Biology.

[21]  Richard J. Edwards,et al.  SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in Proteins , 2007, PloS one.

[22]  See-Kiong Ng,et al.  A correlated motif approach for finding short linear motifs from protein interaction networks , 2006, BMC Bioinformatics.

[23]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[24]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[25]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[26]  Niall J. Haslam,et al.  Understanding eukaryotic linear motifs and their role in cell signaling and regulation. , 2008, Frontiers in bioscience : a journal and virtual library.

[27]  Ronen Marmorstein,et al.  When viral oncoprotein meets tumor suppressor: a structural view. , 2006, Genes & development.

[28]  Natalie G Ahn,et al.  Proteomics and genomics: perspectives on drug and target discovery , 2008, Current Opinion in Chemical Biology.

[29]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[30]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[31]  T. Gibson,et al.  Systematic Discovery of New Recognition Peptides Mediating Protein Interaction Networks , 2005, PLoS biology.

[32]  Alvaro Villarroel,et al.  The Identification and Characterization of a Noncontinuous Calmodulin-binding Site in Noninactivating Voltage-dependent KCNQ Potassium Channels* , 2002, The Journal of Biological Chemistry.

[33]  J. H. Shinn,et al.  Minimotif Miner: a tool for investigating protein function , 2006, Nature Methods.

[34]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[35]  Arnaud Céol,et al.  3did: identification and classification of domain-based interactions of known three-dimensional structure , 2010, Nucleic Acids Res..