Quantifying sources of uncertainty in drug discovery predictions with probabilistic models

Knowing the uncertainty in a prediction is critical when making expensive investment decisions and when patient safety is paramount, but machine learning (ML) models in drug discovery typically provide only a single best estimate and ignore all sources of uncertainty. Predictions from these models may therefore be over-confident, which can put patients at risk and waste resources when compounds that are destined to fail are further developed. Probabilistic predictive models (PPMs) can incorporate uncertainty in both the data and model, and return a distribution of predicted values that represents the uncertainty in the prediction. PPMs not only let users know when predictions are uncertain, but the intuitive output from these models makes communicating risk easier and decision making better. Many popular machine learning methods have a PPM or Bayesian analogue, making PPMs easy to fit into current workflows. We use toxicity prediction as a running example, but the same principles apply for all prediction models used in drug discovery. The consequences of ignoring uncertainty and how PPMs account for uncertainty are also described. We aim to make the discussion accessible to a broad non-mathematical audience. Equations are provided to make ideas concrete for mathematical readers (but can be skipped without loss of understanding) and code is available for computational researchers (this https URL).

[1]  Glen DePalma,et al.  Bayesian monotonic errors‐in‐variables models with applications to pathogen susceptibility testing , 2018, Statistics in medicine.

[2]  Andreas Krause,et al.  A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions , 2016, bioRxiv.

[3]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[4]  Gary King,et al.  A Unified Approach to Measurement Error and Missing Data: Overview and Applications , 2017 .

[5]  Francis Tuerlinckx,et al.  Increasing Transparency Through a Multiverse Analysis , 2016, Perspectives on psychological science : a journal of the Association for Psychological Science.

[6]  A. Weigend,et al.  Estimating the mean and variance of the target probability distribution , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[7]  H. Jeffreys A Treatise on Probability , 1922, Nature.

[8]  E. George,et al.  The Spike-and-Slab LASSO , 2018 .

[9]  Yao Zhang,et al.  Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning. , 2019 .

[10]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[11]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[12]  M. Lawera Predictive inference : an introduction , 1995 .

[13]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[14]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[15]  Ola Engkvist,et al.  Uncertainty quantification in drug design. , 2020, Drug discovery today.

[16]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[17]  Nathan Pollesch,et al.  Predicting the Probability that a Chemical Causes Steatosis Using Adverse Outcome Pathway Bayesian Networks (AOPBNs) , 2019, Risk analysis : an official publication of the Society for Risk Analysis.

[18]  Zoubin Ghahramani,et al.  Turing: A Language for Flexible Probabilistic Inference , 2018 .

[19]  Myunghee Cho Paik,et al.  Uncertainty quantification using Bayesian neural networks in classification: Application to biomedical image segmentation , 2020, Comput. Stat. Data Anal..

[20]  Joe Reynolds,et al.  A Bayesian approach for inferring global points of departure from transcriptomics data , 2020 .

[21]  Aki Vehtari,et al.  Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC , 2015, Statistics and Computing.

[22]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[23]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[24]  V. Carey,et al.  Mixed-Effects Models in S and S-Plus , 2001 .

[25]  Jouko Lampinen,et al.  Bayesian Neural Networks: Case Studies in Industrial Applications , 2000 .

[26]  Stanley E Lazic,et al.  Four simple ways to increase power without increasing the sample size , 2018, Laboratory animals.

[27]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004 .

[28]  Avid M. Afzal,et al.  Flexible fitting of PROTAC concentration-response curves with changepoint Gaussian Processes , 2020, bioRxiv.

[29]  Gillian Z. Heller,et al.  Distributions for Modeling Location, Scale, and Shape , 2019 .

[30]  Regina Barzilay,et al.  Uncertainty Quantification Using Neural Networks for Molecular Property Prediction , 2020, J. Chem. Inf. Model..

[31]  Richard McElreath,et al.  Statistical Rethinking: A Bayesian Course with Examples in R and Stan , 2015 .

[32]  Stanley E Lazic,et al.  Predicting drug safety and communicating risk: benefits of a Bayesian approach , 2017, bioRxiv.

[33]  Stanley E. Lazic,et al.  Improving drug safety predictions by reducing poor analytical practices , 2020, bioRxiv.

[34]  William Briggs,et al.  Uncertainty: The Soul of Modeling, Probability & Statistics , 2016 .

[35]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[36]  Aki Vehtari,et al.  Sparsity information and regularization in the horseshoe and other shrinkage priors , 2017, 1707.01694.

[37]  William A. Cunningham,et al.  Crowd-sourcing Hypothesis Tests: Making Transparent How Design Choices Shape Research Results , 2020, Psychological bulletin.

[38]  L. Held,et al.  Bayesian analysis of measurement error models using integrated nested Laplace approximations , 2015 .

[39]  Agustinus Kristiadi,et al.  Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks , 2020, ICML.

[40]  Cristian S. Dambros,et al.  In defense of P values: comment on the statistical methods actually used by ecologists. , 2014, Ecology.

[41]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[42]  J. G. Cragg Some Statistical Models for Limited Dependent Variables with Application to the Demand for Durable Goods , 1971 .

[43]  Gary R. Mirams,et al.  Hierarchical Bayesian inference for ion channel screening dose-response data , 2016, Wellcome open research.

[44]  Aki Vehtari,et al.  Erratum to: Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC , 2017, Stat. Comput..

[45]  Kevin C. Dorff,et al.  The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models , 2010, Nature Biotechnology.

[46]  Trevor Campbell,et al.  Coresets for Scalable Bayesian Logistic Regression , 2016, NIPS.

[47]  S Richardson,et al.  A Bayesian approach to measurement error problems in epidemiology using conditional independence models. , 1993, American journal of epidemiology.

[48]  P. Levy Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments , 2004 .

[49]  Mathew H. Evans,et al.  Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results , 2018, Advances in Methods and Practices in Psychological Science.

[50]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[51]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[52]  Trevor Campbell,et al.  Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent , 2018, ICML.

[53]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[54]  Kevin Smith,et al.  Bayesian Uncertainty Estimation for Batch Normalized Deep Networks , 2018, ICML.

[55]  Bradley C. Love,et al.  Variability in the analysis of a single neuroimaging dataset by many teams , 2020, Nature.

[56]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[57]  Amir F. Atiya,et al.  Comprehensive Review of Neural Network-Based Prediction Intervals and New Advances , 2011, IEEE Transactions on Neural Networks.

[58]  Raymond J. Carroll,et al.  Measurement error in nonlinear models: a modern perspective , 2006 .

[59]  David J. Lunn,et al.  The BUGS Book: A Practical Introduction to Bayesian Analysis , 2013 .

[60]  Fernando A. Quintana,et al.  Bayesian Nonparametric Data Analysis , 2015 .

[61]  Avid M. Afzal,et al.  A Bayesian neural network for toxicity prediction , 2020, bioRxiv.

[62]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.

[63]  Alison J. Foster,et al.  Predicting Drug-Induced Liver Injury with Bayesian Machine Learning. , 2019, Chemical research in toxicology.

[64]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[65]  P. Grieco,et al.  The Influence of Hidden Researcher Decisions in Applied Microeconomics , 2020, Economic Inquiry.

[66]  James G. Scott,et al.  Handling Sparsity via the Horseshoe , 2009, AISTATS.

[67]  Alpha A. Lee,et al.  Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning , 2019, Chemical science.

[68]  Tim Pearce,et al.  Uncertainty in Neural Networks: Approximately Bayesian Ensembling , 2018, AISTATS.

[69]  Purushottam W. Laud,et al.  Nonparametric survival analysis using Bayesian Additive Regression Trees (BART) , 2016, Statistics in medicine.

[70]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[71]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .

[72]  Robert B. Gramacy,et al.  Surrogates: Gaussian Process Modeling, Design, and Optimization for the Applied Sciences , 2020 .