A Transformer-Based Ensemble Framework for the Prediction of Protein–Protein Interaction Sites

The identification of protein–protein interaction (PPI) sites is essential in the research of protein function and the discovery of new drugs. So far, a variety of computational tools based on machine learning have been developed to accelerate the identification of PPI sites. However, existing methods suffer from the low predictive accuracy or the limited scope of application. Specifically, some methods learned only global or local sequential features, leading to low predictive accuracy, while others achieved improved performance by extracting residue interactions from structures but were limited in their application scope for the serious dependence on precise structure information. There is an urgent need to develop a method that integrates comprehensive information to realize proteome-wide accurate profiling of PPI sites. Herein, a novel ensemble framework for PPI sites prediction, EnsemPPIS, was therefore proposed based on transformer and gated convolutional networks. EnsemPPIS can effectively capture not only global and local patterns but also residue interactions. Specifically, EnsemPPIS was unique in (a) extracting residue interactions from protein sequences with transformer and (b) further integrating global and local sequential features with the ensemble learning strategy. Compared with various existing methods, EnsemPPIS exhibited either superior performance or broader applicability on multiple PPI sites prediction tasks. Moreover, pattern analysis based on the interpretability of EnsemPPIS demonstrated that EnsemPPIS was fully capable of learning residue interactions within the local structure of PPI sites using only sequence information. The web server of EnsemPPIS is freely available at http://idrblab.org/ensemppis.

[1]  T. Akutsu,et al.  iAMPCN: a deep-learning approach for identifying antimicrobial peptides and their functional activities , 2023, Briefings Bioinform..

[2]  M. dal Peraro,et al.  PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces , 2023, Nature communications.

[3]  István A. Kovács,et al.  Next-generation large-scale binary protein interaction network for Drosophila melanogaster , 2023, Nature communications.

[4]  Yi Zhao,et al.  AlphaFold2 and its applications in the fields of biology and medicine , 2023, Signal Transduction and Targeted Therapy.

[5]  K. Nakai,et al.  DeepBIO: an automated and interpretable deep-learning platform for high-throughput biological sequence prediction, functional annotation and visualization analysis , 2023, bioRxiv.

[6]  Shaojun Tang,et al.  SARS-CoV-2 Spike Protein Post-Translational Modification Landscape and Its Impact on Protein Structure and Function via Computational Prediction , 2023, Research.

[7]  Q. Zou,et al.  Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE , 2023, BMC Biology.

[8]  Ka-chun Wong,et al.  Learning the protein language of proteome-wide protein-protein binding sites via explainable ensemble deep learning , 2023, Communications Biology.

[9]  Yanru Hai,et al.  Deep-learning based approach to identify substrates of human E3 ubiquitin ligases and deubiquitinases , 2023, Computational and structural biotechnology journal.

[10]  Panpan Wang,et al.  A novel strategy for designing the magic shotguns for distantly related target pairs , 2023, Briefings Bioinform..

[11]  Ren Qi,et al.  Trends and Potential of Machine Learning and Deep Learning in Drug Study at Single-Cell Level , 2023, Research.

[12]  Zeming Lin,et al.  Evolutionary-scale prediction of atomic level protein structure with a language model , 2022, bioRxiv.

[13]  Cangzhi Jia,et al.  COPPER: an ensemble deep-learning approach for identifying exclusive virus-derived small interfering RNAs in plants. , 2022, Briefings in functional genomics.

[14]  Jose M. Duarte,et al.  RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning , 2022, Nucleic Acids Res..

[15]  Tao Song,et al.  RGN: Residue-Based Graph Attention and Convolutional Network for Protein-Protein Interaction Site Prediction , 2022, J. Chem. Inf. Model..

[16]  Yunxia Wang,et al.  Application of Machine Learning in Spatial Proteomics , 2022, J. Chem. Inf. Model..

[17]  Yunxia Wang,et al.  ncRNAInter: a novel strategy based on graph neural network to discover interactions between lncRNA and miRNA , 2022, Briefings Bioinform..

[18]  George M. Church,et al.  Single-sequence protein structure prediction using a language model and deep learning , 2022, Nature Biotechnology.

[19]  T. Akutsu,et al.  PROST: AlphaFold2-aware Sequence-Based Predictor to Estimate Protein Stability Changes upon Missense Mutations , 2022, J. Chem. Inf. Model..

[20]  S. Ovchinnikov,et al.  Scaffolding protein functional sites using deep learning , 2022, Science.

[21]  Lingxiao Jiang,et al.  Proteome-Wide Profiling of the Covalent-Druggable Cysteines with a Structure-Based Deep Graph Learning Network , 2022, Research.

[22]  Rakesh Kaundal,et al.  WeCoNET: a host–pathogen interactome database for deciphering crucial molecular networks of wheat-common bunt cross-talk mechanisms , 2022, Plant methods.

[23]  Q. Zou,et al.  Effector-GAN: prediction of fungal effector proteins based on pretrained deep representation learning methods and generative adversarial networks , 2022, Bioinform..

[24]  Yunxia Wang,et al.  Biological activities of drug inactive ingredients , 2022, Briefings Bioinform..

[25]  Feiyue Huang,et al.  Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and-Language Tasks , 2022, IEEE Transactions on Image Processing.

[26]  Fengcheng Li,et al.  PFmulDL: a novel strategy enabling multi-class and multi-label protein function annotation by integrating diverse deep learning methods , 2022, Comput. Biol. Medicine.

[27]  Kah Yee Tai,et al.  Leveraging Mann–Whitney U test on large-scale genetic variation data for analysing malaria genetic markers , 2022, Malaria journal.

[28]  Hamid Zouaki,et al.  Augmented Graph Neural Network with hierarchical global-based residual connections , 2022, Neural Networks.

[29]  Hualiang Jiang,et al.  Recent advances in predicting protein-protein interactions with the aid of artificial intelligence algorithms. , 2022, Current opinion in structural biology.

[30]  Ka-chun Wong,et al.  HCRNet: high-throughput circRNA-binding event identification from CLIP-seq data using deep temporal convolutional network , 2022, Briefings Bioinform..

[31]  Hongwu Ma,et al.  Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework , 2022, Research.

[32]  Yongyong Shi,et al.  Structural Comparison and Drug Screening of Spike Proteins of Ten SARS-CoV-2 Variants , 2022, Research.

[33]  Hongmei Zhou,et al.  TGF-βRII regulates glucose metabolism in oral cancer-associated fibroblasts via promoting PKM2 nuclear translocation , 2022 .

[34]  G. Buel,et al.  Can AlphaFold2 predict the impact of missense mutations on structure? , 2022, Nature Structural & Molecular Biology.

[35]  Yongyong Shi,et al.  Structural Analysis of the SARS-CoV-2 Omicron Variant Proteins , 2021, Research.

[36]  Xue Zhang,et al.  How DNA affects the hyperthermophilic protein Ape10b2 for oligomerization: an investigation using multiple short molecular dynamics simulations. , 2021, Physical Chemistry, Chemical Physics - PCCP.

[37]  S. Zeng,et al.  VARIDT 2.0: structural variability of drug transporter , 2021, Nucleic Acids Res..

[38]  Ka-chun Wong,et al.  EDCNN: identification of genome-wide RNA-binding proteins using evolutionary deep convolutional neural network , 2021, Bioinform..

[39]  T. Ideker,et al.  A protein interaction landscape of breast cancer , 2021, Science.

[40]  Yaoqi Zhou,et al.  Structure-aware protein-protein interaction site prediction using deep graph convolutional network , 2021, Bioinform..

[41]  H. Wolfson,et al.  ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction , 2021, Nature Methods.

[42]  G. Makhatadze Faculty Opinions recommendation of Accurate prediction of protein structures and interactions using a three-track neural network. , 2021, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[43]  Ka-chun Wong,et al.  iDeepSubMito: identification of protein submitochondrial localization with deep learning , 2021, Briefings Bioinform..

[44]  Ao Li,et al.  PhosIDN: an integrated deep neural network for improving protein phosphorylation site prediction by combining sequence and protein–protein interaction information , 2021, Bioinform..

[45]  K. Kavukcuoglu,et al.  Highly accurate protein structure prediction for the human proteome , 2021, Nature.

[46]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[47]  Nikos Deligiannis,et al.  Learned Gradient Compression for Distributed Deep Learning , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[48]  Chris Bailey-Kellogg,et al.  Protein interaction interface region prediction by geometric deep learning , 2021, Bioinform..

[49]  Chu Qin,et al.  Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations , 2021, Nature Machine Intelligence.

[50]  Jing Chen,et al.  Alcoholic fatty liver disease inhibited the co-expression of Fmo5 and PPARα to activate the NF-κB signaling pathway, thereby reducing liver injury via inducing gut microbiota disturbance , 2021, Journal of Experimental & Clinical Cancer Research.

[51]  Sazan Mahbub,et al.  EGAT: Edge Aggregated Graph Attention Networks and Transfer Learning Improve Protein-Protein Interaction Site Prediction , 2020, bioRxiv.

[52]  Xiangtao Li,et al.  iCircRBP-DHN: identification of circRNA-RBP interaction sites using deep hierarchical network , 2020, Briefings Bioinform..

[53]  Rigbe G Weldatsadik,et al.  Combined proximity labeling and affinity purification−mass spectrometry workflow for mapping and visualizing protein interaction networks , 2020, Nature Protocols.

[54]  Paul W Anderson,et al.  A Human IgSF Cell-Surface Interactome Reveals a Complex Network of Protein-Protein Interactions , 2020, Cell.

[55]  Arne Elofsson,et al.  TransformerCPI: improving compound-protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments , 2020, Bioinform..

[56]  Hongxing Zhang,et al.  How do mutations affect the structural characteristics and substrate binding of CYP21A2? An investigation by molecular dynamics simulations. , 2020, Physical chemistry chemical physics : PCCP.

[57]  B. Rost,et al.  ProNA2020 predicts protein-DNA, protein-RNA and protein-protein binding proteins and residues from sequence. , 2020, Journal of molecular biology.

[58]  Lucian Ilie,et al.  DELPHI: accurate deep ensemble model for protein interaction sites prediction , 2020, bioRxiv.

[59]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[60]  M. Bronstein,et al.  Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning , 2019, Nature Methods.

[61]  Feng Zhu,et al.  Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery , 2019, Briefings Bioinform..

[62]  Min Li,et al.  Protein-protein interaction site prediction through combining local and global features with deep neural networks , 2019, Bioinform..

[63]  Jinyan Li,et al.  Sequence-based prediction of protein-protein interaction sites by simplified long short-term memory network , 2019, Neurocomputing.

[64]  Lukasz Kurgan,et al.  Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains , 2019, Briefings Bioinform..

[65]  Lukasz Kurgan,et al.  SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences , 2019, Bioinform..

[66]  K. Friedemann Schmidt,et al.  Predictive Multitask Deep Neural Network Models for ADME-Tox Properties: Learning from Large Data Sets , 2019, J. Chem. Inf. Model..

[67]  Robyn M. Kaake,et al.  Protein Interaction Mapping Identifies RBBP6 as a Negative Regulator of Ebola Virus Replication , 2018, Cell.

[68]  Lukasz A. Kurgan,et al.  Review and comparative assessment of sequence‐based predictors of protein‐binding residues , 2018, Briefings Bioinform..

[69]  Ji-Ho Park,et al.  Single-Molecule Co-Immunoprecipitation Reveals Functional Inheritance of EGFRs in Extracellular Vesicles. , 2018, Small.

[70]  P. Hahn,et al.  Overfitting and Use of Mismatched Cohorts in Deep Learning Models: Preventable Design Limitations. , 2018, American journal of respiratory and critical care medicine.

[71]  José María Carazo,et al.  BIPSPI: a method for the prediction of partner-specific protein–protein interfaces , 2018, Bioinform..

[72]  Konstantin Eckle,et al.  A comparison of deep networks with ReLU activation function and linear spline-type methods , 2018, Neural Networks.

[73]  Thomas C. Northey,et al.  IntPred: a structure-based predictor of protein–protein interaction sites , 2017, Bioinform..

[74]  Marissa Fessenden,et al.  Protein maps chart the causes of disease , 2017, Nature.

[75]  Bruce Randall Donald,et al.  A critical analysis of computational protein design with sparse residue interaction graphs , 2017, PLoS Comput. Biol..

[76]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[77]  Jaap Heringa,et al.  Seeing the trees through the forest: sequence‐based homo‐ and heteromeric protein‐protein interaction sites prediction using random forest , 2016, Bioinform..

[78]  Dan Li,et al.  Recent Advances in Protein-Protein Docking. , 2016, Current drug targets.

[79]  Jean-Christophe Nebel,et al.  Progress and challenges in predicting protein interfaces , 2015, Briefings Bioinform..

[80]  Peter B. McGarvey,et al.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..

[81]  Ernst-Walter Knapp,et al.  Protein Secondary Structure Classification Revisited: Processing DSSP Information with PSSC , 2014, J. Chem. Inf. Model..

[82]  Kaustubh D. Dhole,et al.  SPRINGS: Prediction of Protein- Protein Interaction Sites Using Artificial Neural Networks , 2014 .

[83]  Cheng Luo,et al.  Computational methods for drug design and discovery: focus on China , 2013, Trends in Pharmacological Sciences.

[84]  Yang Zhang,et al.  BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions , 2012, Nucleic Acids Res..

[85]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[86]  P. Colas,et al.  Yeast two-hybrid methods and their applications in drug discovery. , 2012, Trends in pharmacological sciences.

[87]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[88]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[89]  T. Oas,et al.  Conformational selection or induced fit: A flux description of reaction mechanism , 2009, Proceedings of the National Academy of Sciences.

[90]  Alfonso Valencia,et al.  Progress and challenges in predicting protein-protein interaction sites , 2008, Briefings Bioinform..

[91]  Burkhard Rost,et al.  ISIS: interaction sites identified from sequence , 2007, Bioinform..

[92]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[93]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[94]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[95]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[96]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[97]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[98]  Alexandre M J J Bonvin,et al.  Information-driven structural modelling of protein-protein interactions. , 2015, Methods in molecular biology.