EGGNet, a Generalizable Geometric Deep Learning Framework for Protein Complex Pose Scoring

Computational prediction of molecule-protein interactions has been key for developing new molecules to interact with a target protein for therapeutics development. Past work includes two independent streams of approaches: (1) predicting protein-protein interactions (PPI) between naturally occurring proteins and (2) predicting the binding affinities between proteins and small molecule ligands (aka drug target interaction, or DTI). Studying the two problems in isolation has limited the ability of these computational models to generalize across the PPI and DTI tasks, both of which ultimately involve non-covalent interactions with a protein target. In this work, we developed an Equivariant Graph of Graphs neural Network (EGGNet), a geometric deep learning framework for molecule-protein binding predictions that can handle three types of molecules for interacting with a target protein: (1) small molecules, (2) synthetic peptides and (3) natural proteins. EGGNet leverages a graph of graphs (GoGs) representation constructed from the molecule structures at atomic-resolution and utilizes a multi-resolution equivariant graph neural network (GNN) to learn from such representations. In addition, EGGNet leverages the underlying biophysics and makes use of both atom- and residue-level interactions, which improve EGGNet’s ability to rank candidate poses from blind docking. EGGNet achieves competitive performance on both a public proteinsmall molecule binding affinity prediction task (80.2% top-1 success rate on CASF-2016) and an synthetic protein interface prediction task (88.4% AUPR). We envision that the proposed geometric deep learning framework can generalize to many other protein interaction prediction problems, such as binding site prediction and molecular docking, helping accelerate protein engineering and structure-based drug development.

[1]  G. Schneider,et al.  Structure-based drug design with geometric deep learning , 2022, Current opinion in structural biology.

[2]  T. Jaakkola,et al.  DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking , 2022, ICLR.

[3]  Sri Priya Ponnapalli,et al.  LM-GVP: an extensible sequence and structure informed deep learning framework for protein property prediction , 2022, Scientific Reports.

[4]  Heng Ji,et al.  Translation between Molecules and Natural Language , 2022, EMNLP.

[5]  A. Carbone,et al.  Deep Local Analysis evaluates protein docking conformations with locally oriented cubes , 2022, bioRxiv.

[6]  Vignesh Ram Somnath,et al.  Multi-Scale Representation Learning on Proteins , 2022, NeurIPS.

[7]  A. Bonvin,et al.  DeepRank-GNN: a graph neural network framework to learn patterns in protein–protein interfaces , 2021, bioRxiv.

[8]  Neil Shah,et al.  Imbalanced Graph Classification via Graph-of-Graph Neural Networks , 2021, CIKM.

[9]  Jianyang Zeng,et al.  A deep-learning framework for multi-level peptide–protein interaction prediction , 2021, Nature Communications.

[10]  A. Bonvin,et al.  DeepRank: a deep learning framework for data mining 3D protein-protein interfaces , 2021, Nature Communications.

[11]  Raphael J. L. Townshend,et al.  Learning from Protein Structure with Geometric Vector Perceptrons , 2020, ICLR.

[12]  Jaechang Lim,et al.  PIGNet: a physics-informed deep learning model toward generalized drug–target interaction predictions , 2020, Chemical science.

[13]  B. Rost,et al.  ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing , 2020, bioRxiv.

[14]  Lu Qin,et al.  GoGNN: Graph of Graphs Neural Network for Predicting Structured Entity Interactions , 2020, IJCAI.

[15]  Yoshihiro Yamanishi,et al.  Dual graph convolutional neural network for predicting chemical networks , 2020, BMC Bioinformatics.

[16]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[17]  Alexandre M J J Bonvin,et al.  PRODIGY-crystal: a web-tool for classification of biological interfaces in protein complexes , 2019, Bioinform..

[18]  Myle Ott,et al.  Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.

[19]  George M. Church,et al.  Unified rational protein engineering with sequence-only deep representation learning , 2019, bioRxiv.

[20]  Roland L. Dunbrack,et al.  ProtCID: a data resource for structural information on protein interactions , 2019, bioRxiv.

[21]  Bonnie Berger,et al.  Learning protein sequence embeddings using information from structure , 2019, ICLR.

[22]  Yan Li,et al.  Comparative Assessment of Scoring Functions: The CASF-2016 Update , 2018, J. Chem. Inf. Model..

[23]  A. Bonvin,et al.  Distinguishing crystallographic from biological interfaces in protein complexes: role of intermolecular contacts and energetics for classification , 2018, BMC Bioinformatics.

[24]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[25]  Yang Li,et al.  PotentialNet for Molecular Property Prediction , 2018, ACS central science.

[26]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[27]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[28]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[29]  Qifang Xu,et al.  Identifying three-dimensional structures of autophosphorylation complexes in crystals of protein kinases , 2015, Science Signaling.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Jose M. Duarte,et al.  A PDB-wide, evolution-based assessment of protein–protein interfaces , 2014, BMC Structural Biology.

[32]  Hanghang Tong,et al.  Inside the atoms: ranking on a network of networks , 2014, KDD.

[33]  Antonio Scala,et al.  Networks of Networks: The Last Frontier of Complexity , 2014 .

[34]  Jose M. Duarte,et al.  Protein interface classification by evolutionary analysis , 2012, BMC Bioinformatics.

[35]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[36]  K. Henrick,et al.  Inference of macromolecular assemblies from crystalline state. , 2007, Journal of molecular biology.

[37]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[38]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[39]  H. Kashima,et al.  Dual graph convolutional neural network for predicting chemical networks , 2020, BMC Bioinformatics.