Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data

MOTIVATION Protein phosphorylation is a post-translational modification that underlines various aspects of cellular signaling. A key step to reconstructing signaling networks involves identification of the set of all kinases and their substrates. Experimental characterization of kinase substrates is both expensive and time-consuming. To expedite the discovery of novel substrates, computational approaches based on kinase recognition sequence (motifs) from known substrates, protein structure, interaction and co-localization have been proposed. However, rarely do these methods take into account the dynamic responses of signaling cascades measured from in vivo cellular systems. Given that recent advances in mass spectrometry-based technologies make it possible to quantify phosphorylation on a proteome-wide scale, computational approaches that can integrate static features with dynamic phosphoproteome data would greatly facilitate the prediction of biologically relevant kinase-specific substrates. RESULTS Here, we propose a positive-unlabeled ensemble learning approach that integrates dynamic phosphoproteomics data with static kinase recognition motifs to predict novel substrates for kinases of interest. We extended a positive-unlabeled learning technique for an ensemble model, which significantly improves prediction sensitivity on novel substrates of kinases while retaining high specificity. We evaluated the performance of the proposed model using simulation studies and subsequently applied it to predict novel substrates of key kinases relevant to insulin signaling. Our analyses show that static sequence motifs and dynamic phosphoproteomics data are complementary and that the proposed integrated model performs better than methods relying only on static information for accurate prediction of kinase-specific substrates. AVAILABILITY AND IMPLEMENTATION Executable GUI tool, source code and documentation are freely available at https://github.com/PengyiYang/KSP-PUEL. CONTACT pengyi.yang@nih.gov or jothi@mail.nih.gov SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Koenraad Van Leemput,et al.  Prediction of kinase-specific phosphorylation sites using conditional random fields , 2008, Bioinform..

[2]  Sean J Humphrey,et al.  High-throughput phosphoproteomics reveals in vivo insulin signaling dynamics , 2015, Nature Biotechnology.

[3]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Jorng-Tzong Horng,et al.  KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites , 2005, Nucleic Acids Res..

[5]  Allegra Via,et al.  Phospho.ELM: a database of phosphorylation sites—update 2008 , 2007, Nucleic Acids Res..

[6]  N. Blom,et al.  Identification of phosphorylation sites in protein kinase A substrates using artificial neural networks and mass spectrometry. , 2004, Journal of proteome research.

[7]  Bermseok Oh,et al.  Prediction of phosphorylation sites using SVMs , 2004, Bioinform..

[8]  Morten Nielsen,et al.  Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion , 2012, Nucleic Acids Res..

[9]  P. Bork,et al.  Linear Motif Atlas for Phosphorylation-Dependent Signaling , 2008, Science Signaling.

[10]  Bin Zhang,et al.  PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse , 2011, Nucleic Acids Res..

[11]  M. Yaffe,et al.  A motif-based profile scanning approach for genome-wide prediction of signaling pathways , 2001, Nature Biotechnology.

[12]  J. Schlessinger,et al.  Cell Signaling by Receptor Tyrosine Kinases , 2000, Cell.

[13]  Yu Xue,et al.  GPS 2.1: enhanced prediction of kinase-specific phosphorylation sites with an algorithm of motif length selection. , 2011, Protein engineering, design & selection : PEDS.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  P. Bork,et al.  Systematic Discovery of In Vivo Phosphorylation Networks , 2007, Cell.

[16]  S. Mathivanan,et al.  A curated compendium of phosphorylation motifs , 2007, Nature Biotechnology.

[17]  R. Aebersold,et al.  Mass spectrometry-based proteomics for systems biology. , 2012, Current opinion in biotechnology.

[18]  Nikolaj Blom,et al.  Kinase-specific prediction of protein phosphorylation sites. , 2009, Methods in molecular biology.

[19]  Sean J Humphrey,et al.  Uncaging Akt , 2012, Science Signaling.

[20]  D. Sabatini,et al.  mTOR Signaling in Growth Control and Disease , 2012, Cell.

[21]  Yu Xue,et al.  GPS 2.0, a Tool to Predict Kinase-specific Phosphorylation Sites in Hierarchy *S , 2008, Molecular & Cellular Proteomics.

[22]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[23]  Zili Zhang,et al.  Sample Subset Optimization Techniques for Imbalanced and Ensemble Learning Problems in Bioinformatics Applications , 2014, IEEE Transactions on Cybernetics.

[24]  Dong Xu,et al.  Musite, a Tool for Global Prediction of General and Kinase-specific Phosphorylation Sites* , 2010, Molecular & Cellular Proteomics.

[25]  Anthony J. Kusalik,et al.  Computational prediction of eukaryotic phosphorylation sites , 2011, Bioinform..

[26]  S. Gygi,et al.  Evaluating Multiplexed Quantitative Phosphopeptide Analysis on a Hybrid Quadrupole Mass Filter/Linear Ion Trap/Orbitrap Mass Spectrometer , 2014, Analytical chemistry.

[27]  Hsien-Da Huang,et al.  KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns , 2007, Nucleic Acids Res..

[28]  David E. James,et al.  Dynamic Adipocyte Phosphoproteome Reveals that Akt Directly Regulates mTORC2 , 2013, Cell metabolism.

[29]  M. Mann,et al.  Status of Large-scale Analysis of Post-translational Modifications by Mass Spectrometry* , 2013, Molecular & Cellular Proteomics.

[30]  Hongyang Wang,et al.  Systematic Analysis of Protein Phosphorylation Networks From Phosphoproteomic Data* , 2012, Molecular & Cellular Proteomics.

[31]  U. Sauer,et al.  Dynamic phosphoproteomics reveals TORC1-dependent regulation of yeast nucleotide and amino acid biosynthesis , 2015, Science Signaling.

[32]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[33]  Michael B. Yaffe,et al.  Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs , 2003, Nucleic Acids Res..

[34]  L. Jensen,et al.  KinomeXplorer: an integrated platform for kinome biology studies , 2014, Nature Methods.

[35]  Rémi Gilleron,et al.  Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..

[36]  T. Hunter,et al.  Protein kinases and phosphatases: The Yin and Yang of protein phosphorylation and signaling , 1995, Cell.

[37]  Jean Yee Hwa Yang,et al.  Knowledge-Based Analysis for Detecting Key Signaling Events from Time-Series Phosphoproteomics Data , 2015, PLoS Comput. Biol..

[38]  M. Mann,et al.  Decoding signalling networks by mass spectrometry-based proteomics , 2010, Nature Reviews Molecular Cell Biology.

[39]  Anthony J. Kusalik,et al.  Computational phosphorylation site prediction in plants using random forests and organism-specific instance weights , 2013, Bioinform..

[40]  M. Mann,et al.  Global, In Vivo, and Site-Specific Phosphorylation Dynamics in Signaling Networks , 2006, Cell.