Domain Interaction Footprint: a multi-classification approach to predict domain-peptide interactions

MOTIVATION The flow of information within cellular pathways largely relies on specific protein-protein interactions. Discovering such interactions that are mostly mediated by peptide recognition modules (PRM) is therefore a fundamental step towards unravelling the complexity of varying pathways. Since peptides can be recognized by more than one PRM and high-throughput experiments are both time consuming and expensive, it would be preferable to narrow down all potential peptide ligands for one specific PRM by a computational method. We at first present Domain Interaction Footprint (DIF) a new approach to predict binding peptides to PRMs merely based on the sequence of the peptides. Second, we show that our method is able to create a multi-classification model that assesses the binding specificity of a given peptide to all examined PRMs at once. RESULTS We first applied our approach to a previously investigated dataset of different SH3 domains and predicted their appropriate peptide ligands with an exceptionally high accuracy. This result outperforms all recent methods trained on the same dataset. Furthermore, we used our technique to build two multi-classification models (SH3 and PDZ domains) to predict the interaction preference between a peptide and every single domain in the corresponding domain family at once. Predicting the domain specificity most reliably, our proposed approach can be seen as a first step towards a complete multi-domain classification model comprised of all domains of one family. Such a comprehensive domain specificity model would benefit the quest for highly specific peptide ligands interacting solely with the domain of choice. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  R. Nicoll,et al.  Phosphorylation of the Postsynaptic Density-95 (PSD-95)/Discs Large/Zona Occludens-1 Binding Site of Stargazin Regulates Binding to PSD-95 and Synaptic Targeting of AMPA Receptors , 2002, The Journal of Neuroscience.

[2]  Gary D Bader,et al.  A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules , 2001, Science.

[3]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[4]  M. Helmer-Citterich,et al.  SH3-SPOT: an algorithm to predict preferred ligands to different members of the SH3 gene family. , 2000, Journal of molecular biology.

[5]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[6]  S. Schreiber,et al.  Two binding orientations for peptides to the Src SH3 domain: development of a general model for SH3-ligand interactions. , 1995, Science.

[7]  Dirk Husmeier,et al.  A regularized discriminative model for the prediction of protein-peptide interactions , 2006, Bioinform..

[8]  Prisca Boisguerin,et al.  Characterization of a Putative Phosphorylation Switch: Adaptation of SPOT Synthesis to Analyze PDZ Domain Regulation Mechanisms , 2007, Chembiochem : a European journal of chemical biology.

[9]  Enrico Ferraro,et al.  A novel structure-based encoding for machine-learning applied to the inference of SH3 domain specificity , 2006, Bioinform..

[10]  S Matsuda,et al.  Phosphorylation of Serine‐880 in GluR2 by Protein Kinase C Prevents Its C Terminus from Binding with Glutamate Receptor‐Interacting Protein , 1999, Journal of neurochemistry.

[11]  Solomon H. Snyder,et al.  Binding of the Inward Rectifier K+ Channel Kir 2.3 to PSD-95 Is Regulated by Protein Kinase A Phosphorylation , 1996, Neuron.

[12]  Benno Schwikowski,et al.  Predicting protein-peptide interactions via a network-based motif sampler , 2004, ISMB/ECCB.

[13]  William A. McLaughlin,et al.  Prediction of binding sites of peptide recognition domains: an application on Grb2 and SAP SH2 domains. , 2006, Journal of molecular biology.

[14]  B. Mayer,et al.  SH3 domains: complexity in moderation. , 2001, Journal of cell science.

[15]  L. Cantley,et al.  Recognition of Unique Carboxyl-Terminal Motifs by Distinct PDZ Domains , 1997, Science.

[16]  Ron Kohavi,et al.  Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology , 1995, KDD.

[17]  Yan Hua Huang,et al.  Regulation of the NMDA Receptor Complex and Trafficking by Activity-Dependent Phosphorylation of the NR2B Subunit PDZ Ligand , 2004, The Journal of Neuroscience.

[18]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[19]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[20]  R. H. CHITTENDEN,et al.  SCIENTIFIC NOTES AND NEWS. , 1911, Science.

[21]  Daniel Barker,et al.  LVB: parsimony and simulated annealing in the search for phylogenetic trees , 2000, Bioinform..

[22]  Prisca Boisguerin,et al.  Quantification of PDZ domain specificity, prediction of ligand affinity and rational design of super-binding peptides. , 2004, Journal of molecular biology.

[23]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[24]  T. Pawson,et al.  Assembly of Cell Regulatory Systems Through Protein Interaction Domains , 2003, Science.

[25]  Wilfred W. Li,et al.  MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..

[26]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[27]  V. Rybin,et al.  Computer-aided design of a PDZ domain to recognize new target sequences , 2002, Nature Structural Biology.

[28]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[29]  Prisca Boisguerin,et al.  An improved method for the synthesis of cellulose membrane-bound peptides with free C termini is useful for PDZ domain binding studies. , 2004, Chemistry & biology.

[30]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[31]  L. Castagnoli,et al.  Protein Interaction Networks by Proteome Peptide Scanning , 2004, PLoS biology.

[32]  Paola Vaccaro,et al.  PDZ domains: troubles in classification , 2002, FEBS letters.

[33]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[34]  Jens Schneider-Mergener,et al.  A complete substitutional analysis of VIP for better tumor imaging properties , 2002, Journal of molecular recognition : JMR.

[35]  R. Huganir,et al.  Phosphorylation of the AMPA Receptor Subunit GluR2 Differentially Regulates Its Interaction with PDZ Domain-Containing Proteins , 2000, The Journal of Neuroscience.

[36]  Takashi Yamauchi,et al.  Interaction of LDL receptor‐related protein 4 (LRP4) with postsynaptic scaffold proteins via its C‐terminal PDZ domain‐binding motif, and its regulation by Ca2+/calmodulin‐dependent protein kinase II , 2006, The European journal of neuroscience.

[37]  D. Brutlag,et al.  Highly specific protein sequence motifs for genome analysis. , 1998, Proceedings of the National Academy of Sciences of the United States of America.