Protein-Ligand Scoring with Convolutional Neural Networks

Computational approaches to drug discovery can reduce the time and cost associated with experimental assays and enable the screening of novel chemotypes. Structure-based drug design methods rely on scoring functions to rank and predict binding affinities and poses. The ever-expanding amount of protein-ligand binding and structural data enables the use of deep machine learning techniques for protein-ligand scoring. We describe convolutional neural network (CNN) scoring functions that take as input a comprehensive three-dimensional (3D) representation of a protein-ligand interaction. A CNN scoring function automatically learns the key features of protein-ligand interactions that correlate with binding. We train and optimize our CNN scoring functions to discriminate between correct and incorrect binding poses and known binders and nonbinders. We find that our CNN scoring function outperforms the AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening.

[1]  Peter Tino,et al.  IEEE Transactions on Neural Networks , 2009 .

[2]  Teruki Honma,et al.  Combining Machine Learning and Pharmacophore-Based Interaction Fingerprint for in Silico Screening , 2010, J. Chem. Inf. Model..

[3]  Anthony Nicholls,et al.  What do we know and when do we know it? , 2008, J. Comput. Aided Mol. Des..

[4]  Sereina Riniker,et al.  Open-source platform to benchmark fingerprints for ligand-based virtual screening , 2013, Journal of Cheminformatics.

[5]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[6]  Peter Gedeck,et al.  Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets , 2010, J. Chem. Inf. Model..

[7]  Jacob D. Durrant,et al.  NNScore 2.0: A Neural-Network Receptor–Ligand Scoring Function , 2011, J. Chem. Inf. Model..

[8]  Rommie E. Amaro,et al.  Machine‐Learning Techniques Applied to Antibacterial Drug Discovery , 2015, Chemical biology & drug design.

[9]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[10]  Ajay N. Jain Scoring noncovalent protein-ligand interactions: A continuous differentiable function tuned to compute binding affinities , 1996, J. Comput. Aided Mol. Des..

[11]  G. Klebe,et al.  Knowledge-based scoring function to predict protein-ligand interactions. , 2000, Journal of molecular biology.

[12]  Marcel L Verdonk,et al.  General and targeted statistical potentials for protein–ligand interactions , 2005, Proteins.

[13]  William L. Jorgensen,et al.  Journal of Chemical Information and Modeling , 2005, J. Chem. Inf. Model..

[14]  Renxiao Wang,et al.  The PDBbind database: methodologies and updates. , 2005, Journal of medicinal chemistry.

[15]  Zachary F. Burton,et al.  α/β Proteins , 2018 .

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Y. Martin,et al.  A general and fast scoring function for protein-ligand interactions: a simplified potential approach. , 1999, Journal of medicinal chemistry.

[18]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[19]  Xiaoqin Zou,et al.  Chapter 14 - Mean-Force Scoring Functions for Protein–Ligand Binding , 2010 .

[20]  Christoph A. Sotriffer,et al.  SFCscoreRF: A Random Forest-Based Scoring Function for Improved Affinity Prediction of Protein-Ligand Complexes , 2013, J. Chem. Inf. Model..

[21]  Ajay N. Jain,et al.  Recommendations for evaluation of computational methods , 2008, J. Comput. Aided Mol. Des..

[22]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[23]  Vijay S. Pande,et al.  Massively Multitask Networks for Drug Discovery , 2015, ArXiv.

[24]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[25]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[26]  Rommie E. Amaro,et al.  Neural-Network Scoring Functions Identify Structurally Novel Estrogen-Receptor Ligands , 2015, J. Chem. Inf. Model..

[27]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[30]  G. V. Paolini,et al.  Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes , 1997, J. Comput. Aided Mol. Des..

[31]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[32]  Shuai Liu,et al.  D3R grand challenge 2015: Evaluation of protein–ligand pose and affinity predictions , 2016, Journal of Computer-Aided Molecular Design.

[33]  Pierre Baldi,et al.  Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules , 2013, J. Chem. Inf. Model..

[34]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[35]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[36]  Pekka Tiikkainen,et al.  Critical Comparison of Virtual Screening Methods against the MUV Data Set , 2009, J. Chem. Inf. Model..

[37]  A. W.,et al.  Journal of chemical information and computer sciences. , 1995, Environmental science & technology.

[38]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[39]  Massimo Bartoletti,et al.  Computational Intelligence Methods for Bioinformatics and Biostatistics , 2017, Lecture Notes in Computer Science.

[40]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[41]  Matthias Troyer,et al.  Solving the quantum many-body problem with artificial neural networks , 2016, Science.

[42]  Luhua Lai,et al.  Deep Learning for Drug-Induced Liver Injury , 2015, J. Chem. Inf. Model..

[43]  Fedor N. Novikov,et al.  CSAR Scoring Challenge Reveals the Need for New Concepts in Estimating Protein-Ligand Binding Affinity , 2011, J. Chem. Inf. Model..

[44]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[45]  Luhua Lai,et al.  Further development and validation of empirical scoring functions for structure-based binding affinity prediction , 2002, J. Comput. Aided Mol. Des..

[46]  Richard D. Smith,et al.  CSAR Benchmark Exercise of 2010: Selection of the Protein–Ligand Complexes , 2011, J. Chem. Inf. Model..

[47]  David Ryan Koes,et al.  Lessons Learned in Empirical Scoring with smina from the CSAR 2011 Benchmarking Exercise , 2013, J. Chem. Inf. Model..

[48]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[50]  J. Skolnick,et al.  GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. , 2011, Biophysical journal.

[51]  Igor I. Baskin,et al.  Predicting Ligand Binding Modes from Neural Networks Trained on Protein-Ligand Interaction Fingerprints , 2013, J. Chem. Inf. Model..

[52]  Thomas Stützle,et al.  Empirical Scoring Functions for Advanced Protein-Ligand Docking with PLANTS , 2009, J. Chem. Inf. Model..

[53]  Didier Rognan,et al.  Beware of Machine Learning-Based Scoring Functions - On the Danger of Developing Black Boxes , 2014, J. Chem. Inf. Model..

[54]  Bowen Zhou,et al.  Classifying Relations by Ranking with Convolutional Neural Networks , 2015, ACL.

[55]  Douglas E. V. Pires,et al.  Platinum: a database of experimentally measured effects of mutations on structurally defined protein–ligand complexes , 2014, Nucleic Acids Res..

[56]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[57]  Kathrin Heikamp,et al.  Large-Scale Similarity Search Profiling of ChEMBL Compound Data Sets , 2011, J. Chem. Inf. Model..

[58]  Matthew P. Repasky,et al.  Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. , 2006, Journal of medicinal chemistry.

[59]  Sebastian G. Rohrer,et al.  Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data , 2009, J. Chem. Inf. Model..

[60]  Nihar R. Mahapatra,et al.  Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins , 2015, BMC Bioinformatics.

[61]  Guido Sanguinetti,et al.  Advances in Neural Information Processing Systems 24 , 2011 .

[62]  Wei Deng,et al.  Predicting Protein-Ligand Binding Affinities Using Novel Geometrical Descriptors and Machine-Learning Methods , 2004, J. Chem. Inf. Model..

[63]  Hans-Joachim Böhm,et al.  The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure , 1994, J. Comput. Aided Mol. Des..

[64]  Dusanka Janezic,et al.  ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment , 2010, Bioinform..

[65]  Jan Ramon,et al.  Predicting Protein Function and Protein-Ligand Interaction with the 3D Neighborhood Kernel , 2015, Discovery Science.

[66]  Richard D. Smith,et al.  CSAR Benchmark Exercise 2011–2012: Evaluation of Results from Docking and Relative Ranking of Blinded Congeneric Series , 2013, J. Chem. Inf. Model..

[67]  Matthieu Hamel,et al.  Journal of Medicinal Chemistry , 2010 .

[68]  Jacob D. Durrant,et al.  NNScore: A Neural-Network-Based Scoring Function for the Characterization of Protein−Ligand Complexes , 2010, J. Chem. Inf. Model..

[69]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.