A correlated motif approach for finding short linear motifs from protein interaction networks

BackgroundAn important class of interaction switches for biological circuits and disease pathways are short binding motifs. However, the biological experiments to find these binding motifs are often laborious and expensive. With the availability of protein interaction data, novel binding motifs can be discovered computationally: by applying standard motif extracting algorithms on protein sequence sets each interacting with either a common protein or a protein group with similar properties. The underlying assumption is that proteins with common interacting partners will share some common binding motifs. Although novel binding motifs have been discovered with such approach, it is not applicable if a protein interacts with very few other proteins or when prior knowledge of protein group is not available or erroneous. Experimental noise in input interaction data can further deteriorate the dismal performance of such approaches.ResultsWe propose a novel approach of finding correlated short sequence motifs from protein-protein interaction data to effectively circumvent the above-mentioned limitations. Correlated motifs are those motifs that consistently co-occur only in pairs of interacting protein sequences, and could possibly interact with each other directly or indirectly to mediate interactions. We adopted the (l, d)-motif model and formulate finding the correlated motifs as an (l, d)-motif pair finding problem. We present both an exact algorithm, D-MOTIF, as well as its approximation algorithm, D-STAR to solve this problem. Evaluation on extensive simulated data showed that our approach not only eliminated the need for any prior protein grouping, but is also more robust in extracting motifs from noisy interaction data. Application on two biological datasets (SH3 interaction network and TGFβ signaling network) demonstrates that the approach can extract correlated motifs that correspond to actual interacting subsequences.ConclusionThe correlated motif approach outlined in this paper is able to find correlated linear motifs from sparse and noisy interaction data. This, in turn, will expedite the discovery of novel linear binding motifs, and facilitate the studies of biological pathways mediated by them.

[1]  Sriram Ramabhadran,et al.  Finding subtle motifs by branching from sample strings , 2003, ECCB.

[2]  See-Kiong Ng,et al.  Integrative Approach for Computationally Inferring Protein Domain Interactions , 2003, Bioinform..

[3]  Gianni Cesareni,et al.  Can we infer peptide recognition specificity mediated by SH3 domains? , 2002, FEBS letters.

[4]  Leszek Rychlewski,et al.  ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins , 2003, Nucleic Acids Res..

[5]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[6]  Yi Zhang,et al.  A map of WW domain family interactions , 2004, Proteomics.

[7]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[8]  T. Pawson,et al.  Assembly of Cell Regulatory Systems Through Protein Interaction Domains , 2003, Science.

[9]  Natasa Przulj,et al.  High-Throughput Mapping of a Dynamic Signaling Network in Mammalian Cells , 2005, Science.

[10]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[11]  Travis Harrison,et al.  A Host-Targeting Signal in Virulence Proteins Reveals a Secretome in Malarial Infection , 2004, Science.

[12]  Haidong Wang,et al.  Identifying Protein-Protein Interaction Sites on a Genome-Wide Scale , 2004, NIPS.

[13]  Uri Keich,et al.  Finding motifs in the twilight zone , 2002, RECOMB '02.

[14]  Inge Jonassen,et al.  Efficient discovery of conserved patterns using a pattern graph , 1997, Comput. Appl. Biosci..

[15]  Eleazar Eskin,et al.  Finding composite regulatory patterns in DNA sequences , 2002, ISMB.

[16]  Melanie Rug,et al.  Targeting Malaria Virulence and Remodeling Proteins to the Host Erythrocyte , 2004, Science.

[17]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[18]  Hawoong Jeong,et al.  Classification of scale-free networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  R. Russell,et al.  Linear motifs: Evolutionary interaction switches , 2005, FEBS letters.

[20]  H. Lehrach,et al.  A protein interaction network links GIT1, an enhancer of huntingtin aggregation, to Huntington's disease. , 2004, Molecular cell.

[21]  T. Gibson,et al.  Systematic Discovery of New Recognition Peptides Mediating Protein Interaction Networks , 2005, PLoS biology.

[22]  Benno Schwikowski,et al.  Predicting protein-peptide interactions via a network-based motif sampler , 2004, ISMB/ECCB.

[23]  Gary D Bader,et al.  A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules , 2001, Science.

[24]  E. Sprinzak,et al.  Correlated sequence-signatures as markers of protein-protein interaction. , 2001, Journal of molecular biology.

[25]  M. Sudol,et al.  The importance of being proline: the interaction of proline‐rich motifs in signaling proteins with their cognate domains , 2000, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[26]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[27]  Graziano Pesole,et al.  An algorithm for finding signals of unknown length in DNA sequences , 2001, ISMB.

[28]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[29]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[30]  See-Kiong Ng,et al.  Integrative approach for computationally inferring protein domain interactions , 2003, SAC '03.

[31]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[32]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.