Imputation of Assay Bioactivity Data Using Deep Learning

We describe a novel deep learning neural network method and its application to impute assay pIC50 values. Unlike conventional machine learning approaches, this method is trained on sparse bioactivity data as input, typical of that found in public and commercial databases, enabling it to learn directly from correlations between activities measured in different assays. In two case studies on public domain data sets we show that the neural network method outperforms traditional quantitative structure-activity relationship (QSAR) models and other leading approaches. Furthermore, by focusing on only the most confident predictions the accuracy is increased to R2 > 0.9 using our method, as compared to R2 = 0.44 when reporting all predictions.

[1]  David Cortes Cold-start recommendations in Collective Matrix Factorization , 2018, ArXiv.

[2]  B. Merget,et al.  Profiling Prediction of Kinase Inhibitors: Toward the Virtual Assay. , 2017, Journal of medicinal chemistry.

[3]  David E. Goldberg,et al.  Parallel Recombinative Simulated Annealing: A Genetic Algorithm , 1995, Parallel Comput..

[4]  J. Dearden,et al.  QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[5]  Tom Heskes,et al.  Practical Confidence and Prediction Intervals , 1996, NIPS.

[6]  Tomasz Bączek,et al.  Molecular descriptor subset selection in theoretical peptide quantitative structure-retention relationship model development using nature-inspired optimization algorithms. , 2015, Analytical chemistry.

[7]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[8]  B. D. Conduit,et al.  Probabilistic design of a molybdenum-base alloy using a neural network , 2018, ArXiv.

[9]  Ian A. Watson,et al.  Selectivity data: assessment, predictions, concordance, and implications. , 2013, Journal of medicinal chemistry.

[10]  P. C. Verpoort,et al.  Materials data validation and imputation with an artificial neural network , 2018, 1803.00133.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Andy Liaw,et al.  Demystifying Multitask Deep Neural Networks for Quantitative Structure-Activity Relationships , 2017, J. Chem. Inf. Model..

[13]  Gábor Csányi,et al.  Gaussian Processes: A Method for Automatic QSAR Modeling of ADME Properties , 2007, J. Chem. Inf. Model..

[14]  Alan F. Murray,et al.  Confidence estimation methods for neural networks : a practical comparison , 2001, ESANN.

[15]  Matthew D. Segall,et al.  The challenges of making decisions using uncertain data , 2015, Journal of Computer-Aided Molecular Design.

[16]  Russ B Altman,et al.  Machine learning in chemoinformatics and drug discovery. , 2018, Drug discovery today.

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  Rachana Mehta,et al.  A review on matrix factorization techniques in recommender systems , 2017, 2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA).

[19]  Jean-Pierre Doucet,et al.  Nonlinear SVM Approaches to QSPR/QSAR Studies and Drug Design , 2007 .

[20]  Olexandr Isayev,et al.  Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[21]  P. Selzer,et al.  Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. , 2000, Journal of medicinal chemistry.

[22]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[23]  Jens Meiler,et al.  Quantitative Structure–Activity Relationship Modeling of Kinase Selectivity Profiles , 2017, Molecules.

[24]  Eric J. Martin,et al.  Profile-QSAR 2.0: Kinase Virtual Screening Accuracy Comparable to Four-Concentration IC50s for Realistically Novel Compounds , 2017, J. Chem. Inf. Model..

[25]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[26]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[27]  Hualin Xi,et al.  Predicting Kinase Selectivity Profiles Using Free-Wilson QSAR Analysis , 2008, J. Chem. Inf. Model..

[28]  George Papadatos,et al.  Unprecedently Large-Scale Kinase Inhibitor Set Enabling the Accurate Prediction of Compound–Kinase Activities: A Way toward Selective Promiscuity by Design? , 2016, J. Chem. Inf. Model..

[29]  P. Prusis,et al.  Predictive proteochemometric models for kinases derived from 3D protein field-based descriptors , 2016 .

[30]  Eric J. Martin,et al.  Profile-QSAR: A Novel meta-QSAR Method that Combines Activities across the Kinase Family To Accurately Predict Affinity, Selectivity, and Cellular Activity , 2011, J. Chem. Inf. Model..

[31]  Stephan C. Schürer,et al.  Kinome-wide Activity Modeling from Diverse Public High-Quality Data Sets , 2013, J. Chem. Inf. Model..

[32]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[33]  B. D. Conduit,et al.  Design of a nickel-base superalloy using a neural network , 2017, ArXiv.