Protein-protein interaction site prediction through combining local and global features with deep neural networks

MOTIVATION Protein-protein interactions (PPIs) play important roles in many biological processes. Conventional biological experiments for identifying PPI sites are costly and time-consuming. Thus, many computational approaches have been proposed to predict PPI sites. Existing computational methods usually use local contextual features to predict PPI sites. Actually, global features of protein sequences are critical for PPI site prediction. RESULTS A new end-to-end deep learning framework, named DeepPPISP, through combining local contextual and global sequence features, is proposed for PPI site prediction. For local contextual features, we use a sliding window to capture features of neighbors of a target amino acid as in previous studies. For global sequence features, a text convolutional neural network is applied to extract features from the whole protein sequence. Then the local contextual and global sequence features are combined to predict PPI sites. By integrating local contextual and global sequence features, DeepPPISP achieves the state-of-the-art performance, which is better than the other competing methods. In order to investigate if global sequence features are helpful in our deep learning model, we remove or change some components in DeepPPISP. Detailed analyses show that global sequence features play important roles in DeepPPISP. AVAILABILITY AND IMPLEMENTATION The DeepPPISP web server is available at http://bioinformatics.csu.edu.cn/PPISP/. The source code can be obtained from https://github.com/CSUBioGroup/DeepPPISP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Yi Pan,et al.  A Deep Learning Framework for Identifying Essential Proteins by Integrating Multiple Types of Biological Information , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[3]  Huan-Xiang Zhou,et al.  Prediction of interface residues in protein–protein complexes by a consensus neural network method: Test against NMR data , 2005, Proteins.

[4]  Thomas C. Northey,et al.  IntPred: a structure-based predictor of protein–protein interaction sites , 2017, Bioinform..

[5]  Alexandre M J J Bonvin,et al.  How proteins get in touch: interface prediction in the study of biomolecular complexes. , 2008, Current protein & peptide science.

[6]  Xiaolong Wang,et al.  Protein-protein interaction site prediction based on conditional random fields , 2007, Bioinform..

[7]  U. Maulik,et al.  Protein–Protein interaction site prediction in Homo sapiens and E. coli using an interaction-affinity based membership function in fuzzy SVM , 2015, Journal of Biosciences.

[8]  Jaap Heringa,et al.  CLUB-MARTINI: Selecting Favourable Interactions amongst Available Candidates, a Coarse-Grained Simulation Approach to Scoring Docking Decoys , 2016, PloS one.

[9]  Burkhard Rost,et al.  ISIS: interaction sites identified from sequence , 2007, Bioinform..

[10]  Vasant Honavar,et al.  A two-stage classifier for identification of protein-protein interface residues , 2004, ISMB/ECCB.

[11]  Fang-Xiang Wu,et al.  Control principles for complex biological networks , 2018, Briefings Bioinform..

[12]  Yu-Dong Cai,et al.  Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS , 2012, PloS one.

[13]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[14]  Hong-Bin Shen,et al.  Predicting RNA‐protein binding sites and motifs through combining local and global deep convolutional neural networks , 2018, Bioinform..

[15]  Andrei L. Turinsky,et al.  Protein-protein interaction networks: the puzzling riches. , 2013, Current opinion in structural biology.

[16]  Xiaoying Wang,et al.  Protein-protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique , 2018, Bioinform..

[17]  Daniel R. Caffrey,et al.  Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? , 2004, Protein science : a publication of the Protein Society.

[18]  Bogdan Istrate,et al.  Transient protein-protein interface prediction: datasets, features, algorithms, and the RAD-T predictor , 2014, BMC Bioinformatics.

[19]  Javier De Las Rivas,et al.  Protein–Protein Interactions Essentials: Key Concepts to Building and Analyzing Interactome Networks , 2010, PLoS Comput. Biol..

[20]  Wenkai Li,et al.  Network-based methods for predicting essential genes or proteins: a survey , 2019, Briefings Bioinform..

[21]  Juliette Martin,et al.  Benchmarking protein–protein interface predictions: Why you should care about protein size , 2014, Proteins.

[22]  Jaap Heringa,et al.  Seeing the trees through the forest: sequence‐based homo‐ and heteromeric protein‐protein interaction sites prediction using random forest , 2016, Bioinform..

[23]  Yaohang Li,et al.  DeepFunc: A Deep Learning Framework for Accurate Prediction of Protein Functions from Protein Sequences and Interactions , 2019, Proteomics.

[24]  Dusanka Janezic,et al.  Protein Surface Conservation in Binding Sites , 2008, J. Chem. Inf. Model..

[25]  M. Šikić,et al.  PSAIA – Protein Structure and Interaction Analyzer , 2008, BMC Structural Biology.

[26]  Jae-Seong Yang,et al.  Evolutionary conservation in multiple faces of protein interaction , 2009, Proteins.

[27]  B. Rost,et al.  Predicted protein–protein interaction sites from local sequence information , 2003, FEBS letters.

[28]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[29]  Joanna Masel,et al.  Protein stickiness, rather than number of functional protein-protein interactions, predicts expression noise and plasticity in yeast , 2012, BMC Systems Biology.

[30]  Jieyue He,et al.  A semi-supervised deep network embedding approach based on the neighborhood structure , 2019, Big Data Min. Anal..

[31]  Lukasz A. Kurgan,et al.  Review and comparative assessment of sequence‐based predictors of protein‐binding residues , 2018, Briefings Bioinform..

[32]  Mainak Guharoy,et al.  Secondary structure based analysis and classification of biological interfaces: identification of binding motifs in protein-protein interactions , 2007, Bioinform..

[33]  Lan V. Zhang,et al.  Evidence for dynamically organized modularity in the yeast protein–protein interaction network , 2004, Nature.

[34]  Jaap Heringa,et al.  Sequence specificity between interacting and non-interacting homologs identifies interface residues – a homodimer and monomer use case , 2015, BMC Bioinformatics.

[35]  Lukasz Kurgan,et al.  SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences , 2019, Bioinform..

[36]  K. Shaitan,et al.  Dynamic proteomics in modeling of the living cell. Protein-protein interactions , 2009, Biochemistry (Moscow).

[37]  R. Russell,et al.  Targeting and tinkering with interaction networks. , 2008, Nature chemical biology.

[38]  Yaohang Li,et al.  Dinosolve: a protein disulfide bonding prediction server using context-based features to enhance prediction accuracy , 2013, BMC Bioinformatics.

[39]  Shuigeng Zhou,et al.  Prediction of protein-protein interaction sites using an ensemble method , 2009, BMC Bioinformatics.

[40]  Yi Pan,et al.  Automatic ICD-9 coding via deep transfer learning , 2019, Neurocomputing.

[41]  Yi Pan,et al.  Automated ICD-9 Coding via A Deep Learning Approach , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[42]  Yi Pan,et al.  A Deep Learning Framework for Identifying Essential Proteins Based on Protein-Protein Interaction Network and Gene Expression Data , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[43]  Fan Jiang,et al.  Prediction of protein-protein binding site by using core interface residue and support vector machine , 2008, BMC Bioinformatics.

[44]  Kaustubh D. Dhole,et al.  SPRINGS: Prediction of Protein- Protein Interaction Sites Using Artificial Neural Networks , 2014 .

[45]  Xue-wen Chen,et al.  Heterogeneous data integration by tree‐augmented naïve Bayes for protein–protein interactions prediction , 2013, Proteomics.