Multi-task bioassay pre-training for protein-ligand binding affinity prediction

Protein-ligand binding affinity (PLBA) prediction is the fundamental task in drug discovery. Recently, various deep learning-based models predict binding affinity by incorporating the three-dimensional structure of protein-ligand complexes as input and achieving astounding progress. However, due to the scarcity of high-quality training data, the generalization ability of current models is still limited. In addition, different bioassays use varying affinity measurement labels (i.e., IC50, Ki, Kd), and different experimental conditions inevitably introduce systematic noise, which poses a significant challenge to constructing high-precision affinity prediction models. To address these issues, we (1) propose Multi-task Bioassay Pre-training (MBP), a pre-training framework for structure-based PLBA prediction; (2) construct a pre-training dataset called ChEMBL-Dock with more than 300k experimentally measured affinity labels and about 2.8M docked three-dimensional structures. By introducing multi-task pre-training to treat the prediction of different affinity labels as different tasks and classifying relative rankings between samples from the same bioassay, MBP learns robust and transferrable structural knowledge from our new ChEMBL-Dock dataset with varied and noisy labels. Experiments substantiate the capability of MBP as a general framework that can improve and be tailored to mainstream structure-based PLBA prediction tasks. To the best of our knowledge, MBP is the first affinity pre-training model and shows great potential for future development.

[1]  Chen Cao,et al.  A novel method for drug-target interaction prediction based on graph transformers model , 2022, BMC Bioinformatics.

[2]  Chengtao Li,et al.  TANKBind: Trigonometry-Aware Neural NetworKs for Drug-Protein Binding Structure Prediction , 2022, bioRxiv.

[3]  Yong Liu,et al.  Graph–sequence attention and transformer for predicting drug–target affinity , 2022, RSC advances.

[4]  Di He,et al.  One Transformer Can Understand Both 2D & 3D Molecular Data , 2022, ICLR.

[5]  T. Jaakkola,et al.  DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking , 2022, ICLR.

[6]  Tie-Yan Liu,et al.  Unified 2D and 3D Pre-Training of Molecular Representations , 2022, KDD.

[7]  P. Biggin,et al.  Scoring Functions for Protein-Ligand Binding Affinity Prediction Using Structure-based Deep Learning: A Review , 2022, Frontiers in Bioinformatics.

[8]  Wenhui Xi,et al.  Inter-Residue Distance Prediction From Duet Deep Learning Models , 2022, Frontiers in Genetics.

[9]  Shitong Luo,et al.  Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets , 2022, ICML.

[10]  Philip H. S. Torr,et al.  MP2: A Momentum Contrast Approach for Recommendation with Pointwise and Pairwise Learning , 2022, SIGIR.

[11]  Hua Wu,et al.  BatchDTA: Implicit batch alignment enhances deep learning-based drug-target affinity estimation , 2022, bioRxiv.

[12]  K. Turhan,et al.  Learning functional properties of proteins with language models , 2022, Nature Machine Intelligence.

[13]  Yu Rong,et al.  Geometrically Equivariant Graph Neural Networks: A Survey , 2022, ArXiv.

[14]  T. Jaakkola,et al.  EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction , 2022, ICML.

[15]  Jike Wang,et al.  InteractionGraphNet: A Novel and Efficient Deep Graph Representation Learning Framework for Accurate Protein-Ligand Interaction Predictions. , 2021, Journal of medicinal chemistry.

[16]  T. Jaakkola,et al.  Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking , 2021, ICLR.

[17]  Chee-Kong Lee,et al.  Motif-based Graph Self-Supervised Learning for Molecular Property Prediction , 2021, NeurIPS.

[18]  Dejing Dou,et al.  Structure-aware Interactive Graph Neural Networks for the Prediction of Protein-Ligand Binding Affinity , 2021, KDD.

[19]  Hua Wu,et al.  Geometry-enhanced molecular representation learning for property prediction , 2021, Nature Machine Intelligence.

[20]  Sanghyun Park,et al.  Binding affinity prediction for protein–ligand complex using deep attention mechanism based on intermolecular interactions , 2021, BMC Bioinformatics.

[21]  Jean-Michel Renders,et al.  Adaptive Pointwise-Pairwise Learning-to-Rank for Content-based Personalized Recommendation , 2020, RecSys.

[22]  Michael Crawshaw,et al.  Multi-Task Learning with Deep Neural Networks: A Survey , 2020, ArXiv.

[23]  David Ryan Koes,et al.  3D Convolutional Neural Networks and a CrossDocked Dataset for Structure-Based Drug Design. , 2020, Journal of chemical information and modeling.

[24]  Xiaomin Luo,et al.  Pushing the boundaries of molecular representation for drug discovery with graph attention mechanism. , 2020, Journal of medicinal chemistry.

[25]  Jaechang Lim,et al.  PIGNet: a physics-informed deep learning model toward generalized drug–target interaction predictions , 2020, Chemical science.

[26]  B. Rost,et al.  ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing , 2020, bioRxiv.

[27]  Yuedong Yang,et al.  Communicative Representation Learning on Attributed Molecular Graphs , 2020, IJCAI.

[28]  Yatao Bian,et al.  Self-Supervised Graph Transformer on Large-Scale Molecular Data , 2020, NeurIPS.

[29]  Arne Elofsson,et al.  TransformerCPI: improving compound-protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments , 2020, Bioinform..

[30]  Derek Jones,et al.  Improved Protein-ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference , 2020, J. Chem. Inf. Model..

[31]  Ce Zhang,et al.  RosENet: Improving Binding Affinity Prediction by Leveraging Molecular Mechanics Energies with an Ensemble of 3D Convolutional Neural Networks , 2020, J. Chem. Inf. Model..

[32]  Juyong Lee,et al.  AK-Score: Accurate Protein-Ligand Binding Affinity Prediction Using an Ensemble of 3D-Convolutional Neural Networks , 2020, International journal of molecular sciences.

[33]  Stephan Günnemann,et al.  Directional Message Passing for Molecular Graphs , 2020, ICLR.

[34]  Stanislaw Jastrzebski,et al.  Molecule Attention Transformer , 2020, ArXiv.

[35]  Stefan Kramer,et al.  Pairwise Learning to Rank by Neural Networks Revisited: Reconstruction, Theoretical Analysis and Practical Performance , 2019, ECML/PKDD.

[36]  함지연,et al.  Predicting drug-target interaction using a novel graph neural network with 3D structure-embedded graph representation , 2019 .

[37]  Yuguang Mu,et al.  OnionNet: a Multiple-Layer Intermolecular-Contact-Based Convolutional Neural Network for Protein–Ligand Binding Affinity Prediction , 2019, ACS omega.

[38]  Regina Barzilay,et al.  Analyzing Learned Molecular Representations for Property Prediction , 2019, J. Chem. Inf. Model..

[39]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[40]  Ken-ichi Kawarabayashi,et al.  Representation Learning on Graphs with Jumping Knowledge Networks , 2018, ICML.

[41]  Gianni De Fabritiis,et al.  KDEEP: Protein-Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks , 2018, J. Chem. Inf. Model..

[42]  Marta M. Stepniewska-Dziubinska,et al.  Development and evaluation of a deep learning model for protein-ligand binding affinity prediction , 2017, 1712.07042.

[43]  Yu Lei,et al.  Alternating Pointwise-Pairwise Learning for Personalized Item Ranking , 2017, CIKM.

[44]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[45]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[46]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[47]  Zhihai Liu,et al.  Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions. , 2017, Accounts of chemical research.

[48]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[49]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[50]  George Papadatos,et al.  Activity, assay and target data curation and quality in the ChEMBL database , 2015, Journal of Computer-Aided Molecular Design.

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Chih-Jen Lin,et al.  Large-Scale Linear RankSVM , 2014, Neural Computation.

[53]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[54]  Richard D. Smith,et al.  CSAR Data Set Release 2012: Ligands, Affinities, Complexes, and Docking Decoys , 2013, J. Chem. Inf. Model..

[55]  David Ryan Koes,et al.  Lessons Learned in Empirical Scoring with smina from the CSAR 2011 Benchmarking Exercise , 2013, J. Chem. Inf. Model..

[56]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[57]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[58]  M. Mezei,et al.  Molecular docking: a powerful approach for structure-based drug discovery. , 2011, Current computer-aided drug design.

[59]  Philip E. Bourne,et al.  A Machine Learning-Based Method To Improve Docking Scoring Functions and Its Application to Drug Repurposing , 2011, J. Chem. Inf. Model..

[60]  Peter Gedeck,et al.  Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets , 2010, J. Chem. Inf. Model..

[61]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[62]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[63]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[64]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[65]  B. Roux,et al.  Computations of standard binding free energies with molecular dynamics simulations. , 2009, The journal of physical chemistry. B.

[66]  Jean-Philippe Vert,et al.  Protein-ligand interaction prediction: an improved chemogenomics approach , 2008, Bioinform..

[67]  Hongyuan Zha,et al.  A regression framework for learning ranking functions using relative relevance judgments , 2007, SIGIR.

[68]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[69]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[70]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[71]  G. Klebe,et al.  Knowledge-based scoring function to predict protein-ligand interactions. , 2000, Journal of molecular biology.

[72]  Y. Martin,et al.  A general and fast scoring function for protein-ligand interactions: a simplified potential approach. , 1999, Journal of medicinal chemistry.

[73]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[74]  A. Slowik,et al.  Spatial Graph Convolutional Networks , 2020, ICONIP.

[75]  Bruno Rizzuti,et al.  Virtual screening in drug discovery: a precious tool for a still-demanding challenge , 2020 .

[76]  Luhua Lai,et al.  Further development and validation of empirical scoring functions for structure-based binding affinity prediction , 2002, J. Comput. Aided Mol. Des..