SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences

Abstract Motivation Accurate predictions of protein-binding residues (PBRs) enhances understanding of molecular-level rules governing protein–protein interactions, helps protein–protein docking and facilitates annotation of protein functions. Recent studies show that current sequence-based predictors of PBRs severely cross-predict residues that interact with other types of protein partners (e.g. RNA and DNA) as PBRs. Moreover, these methods are relatively slow, prohibiting genome-scale use. Results We propose a novel, accurate and fast sequence-based predictor of PBRs that minimizes the cross-predictions. Our SCRIBER (SeleCtive pRoteIn-Binding rEsidue pRedictor) method takes advantage of three innovations: comprehensive dataset that covers multiple types of binding residues, novel types of inputs that are relevant to the prediction of PBRs, and an architecture that is tailored to reduce the cross-predictions. The dataset includes complete protein chains and offers improved coverage of binding annotations that are transferred from multiple protein–protein complexes. We utilize innovative two-layer architecture where the first layer generates a prediction of protein-binding, RNA-binding, DNA-binding and small ligand-binding residues. The second layer re-predicts PBRs by reducing overlap between PBRs and the other types of binding residues produced in the first layer. Empirical tests on an independent test dataset reveal that SCRIBER significantly outperforms current predictors and that all three innovations contribute to its high predictive performance. SCRIBER reduces cross-predictions by between 41% and 69% and our conservative estimates show that it is at least 3 times faster. We provide putative PBRs produced by SCRIBER for the entire human proteome and use these results to hypothesize that about 14% of currently known human protein domains bind proteins. Availability and implementation SCRIBER webserver is available at http://biomine.cs.vcu.edu/servers/SCRIBER/. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  J. Ule,et al.  Protein–RNA interactions: new genomic technologies and perspectives , 2012, Nature Reviews Genetics.

[2]  Xue-wen Chen,et al.  Sequence-based prediction of protein interaction sites with an integrative method , 2009, Bioinform..

[3]  Lukasz Kurgan,et al.  In various protein complexes, disordered protomers have large per‐residue surface areas and area of protein‐, DNA‐ and RNA‐binding interfaces , 2015, FEBS letters.

[4]  A. Emili,et al.  Protein-protein interaction networks: probing disease mechanisms using model systems , 2013, Genome Medicine.

[5]  K. Kinoshita,et al.  Hub Promiscuity in Protein-Protein Interaction Networks , 2010, International journal of molecular sciences.

[6]  B. Rost,et al.  Better prediction of functional effects for sequence variants , 2015, BMC Genomics.

[7]  Rod K. Nibbe,et al.  Protein–protein interaction networks and subnetworks in the biology of disease , 2011, Wiley interdisciplinary reviews. Systems biology and medicine.

[8]  Zsuzsanna Dosztányi,et al.  ANCHOR: web server for predicting protein binding regions in disordered proteins , 2009, Bioinform..

[9]  Vasant Honavar,et al.  HomPPI: a class of sequence homology based protein-protein interface prediction methods , 2011, BMC Bioinformatics.

[10]  Yasser M Kadah,et al.  Improving the prediction of yeast protein function using weighted protein-protein interactions , 2011, Theoretical Biology and Medical Modelling.

[11]  Chen Wang,et al.  Quality assessment for the putative intrinsic disorder in proteins , 2018, Bioinform..

[12]  Ke Chen,et al.  Investigation of Atomic Level Patterns in Protein—Small Ligand Interactions , 2009, PloS one.

[13]  Jiangning Song,et al.  ProBAPred: Inferring protein-protein binding affinity by incorporating protein sequence and structural features , 2018, J. Bioinform. Comput. Biol..

[14]  Lukasz Kurgan,et al.  High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder , 2015, Nucleic acids research.

[15]  J. De las Rivas,et al.  Protein-protein interaction networks: unraveling the wiring of molecular machines within the cell. , 2012, Briefings in functional genomics.

[16]  Konstantinos Pantos,et al.  Evidence for association of the rs605059 polymorphism of HSD17B1 gene with recurrent spontaneous abortions , 2015, The journal of maternal-fetal & neonatal medicine : the official journal of the European Association of Perinatal Medicine, the Federation of Asia and Oceania Perinatal Societies, the International Society of Perinatal Obstetricians.

[17]  A. Ben-Hur,et al.  PAIRpred: Partner‐specific prediction of interacting residues from sequence and structure , 2014, Proteins.

[18]  Lukasz A. Kurgan,et al.  Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources , 2010, Bioinform..

[19]  K. Mizuguchi,et al.  Partner-Aware Prediction of Interacting Residues in Protein-Protein Complexes from Sequence Data , 2011, PloS one.

[20]  L. Castagnoli,et al.  mentha: a resource for browsing integrated protein-interaction networks , 2013, Nature Methods.

[21]  Heng Zhu,et al.  Systematic characterization of protein-DNA interactions , 2011, Cellular and Molecular Life Sciences.

[22]  Lukasz Kurgan,et al.  Prediction of Disordered RNA, DNA, and Protein Binding Regions Using DisoRDPbind. , 2017, Methods in molecular biology.

[23]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[24]  Orit Peleg,et al.  Evolution of specificity in protein-protein interactions. , 2014, Biophysical journal.

[25]  Daniel W. A. Buchan,et al.  Scalable web services for the PSIPRED Protein Analysis Workbench , 2013, Nucleic Acids Res..

[26]  Yaoqi Zhou,et al.  Accurate single‐sequence prediction of solvent accessible surface area using local and global features , 2014, Proteins.

[27]  Mengchen Liu,et al.  Improving the prediction of protein‐nucleic acids binding residues via multiple sequence profiles and the consensus of complementary methods , 2018, Bioinform..

[28]  Jan Tavernier,et al.  Modulation of Protein–Protein Interactions for the Development of Novel Therapeutics , 2015, Molecular therapy : the journal of the American Society of Gene Therapy.

[29]  Jonathan J. Ellis,et al.  Protein–RNA interactions: Structural analysis and functional classes , 2006, Proteins.

[30]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[31]  J. Janin,et al.  A dissection of specific and non-specific protein-protein interfaces. , 2004, Journal of molecular biology.

[32]  Kaustubh D. Dhole,et al.  Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier. , 2014, Journal of theoretical biology.

[33]  A Keith Dunker,et al.  Molecular recognition features (MoRFs) in three domains of life. , 2016, Molecular bioSystems.

[34]  Olivier Sperandio,et al.  Editorial: [Hot Topics: Toward the Design of Drugs on Protein-Protein Interactions] , 2012 .

[35]  Daniel Figeys,et al.  Functional proteomics: mapping protein-protein interactions and pathways. , 2002, Current opinion in molecular therapeutics.

[36]  Jiangning Song,et al.  Quokka: a comprehensive tool for rapid and accurate prediction of kinase family‐specific phosphorylation sites in the human proteome , 2018, Bioinform..

[37]  Burkhard Rost,et al.  ISIS: interaction sites identified from sequence , 2007, Bioinform..

[38]  Kuo-Chen Chou,et al.  iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets , 2016, Molecules.

[39]  Kenji Mizuguchi,et al.  A-kinase anchoring protein BIG3 coordinates oestrogen signalling in breast cancer cells , 2017, Nature Communications.

[40]  Ulf Reimer,et al.  Histone H2A and H4 N-terminal Tails Are Positioned by the MEP50 WD Repeat Protein for Efficient Methylation by the PRMT5 Arginine Methyltransferase* , 2015, The Journal of Biological Chemistry.

[41]  Lukasz Kurgan,et al.  Genome‐scale prediction of proteins with long intrinsically disordered regions , 2014, Proteins.

[42]  Lukasz Kurgan,et al.  Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains , 2019, Briefings Bioinform..

[43]  Lukasz Kurgan,et al.  Compartmentalization and Functionality of Nuclear Disorder: Intrinsic Disorder and Protein-Protein Interactions in Intra-Nuclear Compartments , 2015, International journal of molecular sciences.

[44]  Gholamreza Haffari,et al.  PROSPERous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy , 2018, Bioinform..

[45]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[46]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[47]  Michal Brylinski,et al.  Predicting protein interface residues using easily accessible on-line resources , 2015, Briefings Bioinform..

[48]  Lukasz Kurgan,et al.  Disordered nucleiome: Abundance of intrinsic disorder in the DNA‐ and RNA‐binding proteins in 1121 species from Eukaryota, Bacteria and Archaea , 2016, Proteomics.

[49]  José María Carazo,et al.  BIPSPI: a method for the prediction of partner-specific protein–protein interfaces , 2018, Bioinform..

[50]  P. Radivojac,et al.  PROTEINS: Structure, Function, and Bioinformatics Suppl 7:176–182 (2005) Exploiting Heterogeneous Sequence Properties Improves Prediction of Protein Disorder , 2022 .

[51]  Xiuquan Du,et al.  Improved Prediction of Protein Binding Sites from Sequences Using Genetic Algorithm , 2009, The protein journal.

[52]  Yuna Park,et al.  Co-chaperone BAG2 Determines the Pro-oncogenic Role of Cathepsin B in Triple-Negative Breast Cancer Cells. , 2017, Cell reports.

[53]  Lukasz A. Kurgan,et al.  Review and comparative assessment of sequence‐based predictors of protein‐binding residues , 2018, Briefings Bioinform..

[54]  Jing-Yu Yang,et al.  Protein-protein interaction sites prediction by ensembling SVM and sample-weighted random forests , 2016, Neurocomputing.

[55]  S. H. Mahboobi,et al.  The Interaction of RNA Helicase DDX3 with HIV-1 Rev-CRM1-RanGTP Complex during the HIV Replication Cycle , 2015, PloS one.

[56]  Jean-Christophe Nebel,et al.  Progress and challenges in predicting protein interfaces , 2015, Briefings Bioinform..

[57]  Lukasz Kurgan,et al.  A creature with a hundred waggly tails: intrinsically disordered proteins in the ribosome , 2013, Cellular and Molecular Life Sciences.

[58]  Peter Tompa,et al.  Functional Advantages of Conserved Intrinsic Disorder in RNA-Binding Proteins , 2015, PloS one.

[59]  Hui Lu,et al.  NAPS: a residue-level nucleic acid-binding prediction server , 2010, Nucleic Acids Res..

[60]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[61]  Ramanathan Sowdhamini,et al.  Integrative modelling of TIR domain-containing adaptor molecule inducing interferon-β (TRIF) provides insights into its autoinhibited state , 2017, Biology Direct.

[62]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[63]  Thomas Hoenen,et al.  Ebola virus VP24 interacts with NP to facilitate nucleocapsid assembly and genome packaging , 2017, Scientific Reports.

[64]  H Jane Dyson,et al.  Roles of intrinsic disorder in protein-nucleic acid interactions. , 2012, Molecular bioSystems.

[65]  Yang Zhang,et al.  BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions , 2012, Nucleic Acids Res..

[66]  Timothy R. Hughes,et al.  High-throughput characterization of protein–RNA interactions , 2014, Briefings in functional genomics.

[67]  Lukasz A. Kurgan,et al.  DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences , 2016, Bioinform..

[68]  Yu Liu,et al.  Prediction of Protein-Protein Interaction Sites Based on Naive Bayes Classifier , 2015, Biochemistry research international.

[69]  Alan Wee-Chung Liew,et al.  Sequence‐based prediction of protein–peptide binding sites using support vector machine , 2016, J. Comput. Chem..

[70]  Alex Fout,et al.  Protein Interface Prediction using Graph Convolutional Networks , 2017, NIPS.

[71]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[72]  R. Bahadur,et al.  The interface of protein-protein complexes: Analysis of contacts and prediction of interactions , 2008, Cellular and Molecular Life Sciences.

[73]  Jack Y. Yang,et al.  BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features , 2010, BMC Systems Biology.

[74]  D. Lejeune,et al.  Protein–nucleic acid recognition: Statistical analysis of atomic interactions and influence of DNA structure , 2005, Proteins.

[75]  Hong Yan,et al.  Fast prediction of protein-protein interaction sites based on Extreme Learning Machines , 2014, Neurocomputing.

[76]  Lukasz Kurgan,et al.  High‐throughput prediction of disordered moonlighting regions in protein sequences , 2018, Proteins.

[77]  Lukasz Kurgan,et al.  DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues , 2017, Nucleic acids research.

[78]  Keehyoung Joo,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS SANN: Solvent accessibility prediction of proteins , 2022 .

[79]  Hong-Bin Shen,et al.  Prediction of Protein–Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures , 2015, The Journal of Membrane Biology.

[80]  Hai-Ping Cheng,et al.  Molecular modeling and computational analyses suggests that the Sinorhizobium meliloti periplasmic regulator protein ExoR adopts a superhelical fold and is controlled by a unique mechanism of proteolysis , 2015, Protein science : a publication of the Protein Society.

[81]  Naoki Orii,et al.  Wiki-Pi: A Web-Server of Annotated Human Protein-Protein Interactions to Aid in Discovery of Protein Function , 2012, PloS one.

[82]  Jing-Yu Yang,et al.  A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites , 2015, IEEE Transactions on NanoBioscience.

[83]  Jinyan Li,et al.  Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information , 2010, BMC Bioinformatics.