Using beta binomials to estimate classification uncertainty for ensemble models

BackgroundQuantitative structure-activity (QSAR) models have enormous potential for reducing drug discovery and development costs as well as the need for animal testing. Great strides have been made in estimating their overall reliability, but to fully realize that potential, researchers and regulators need to know how confident they can be in individual predictions.ResultsSubmodels in an ensemble model which have been trained on different subsets of a shared training pool represent multiple samples of the model space, and the degree of agreement among them contains information on the reliability of ensemble predictions. For artificial neural network ensembles (ANNEs) using two different methods for determining ensemble classification – one using vote tallies and the other averaging individual network outputs – we have found that the distribution of predictions across positive vote tallies can be reasonably well-modeled as a beta binomial distribution, as can the distribution of errors. Together, these two distributions can be used to estimate the probability that a given predictive classification will be in error. Large data sets comprised of logP, Ames mutagenicity, and CYP2D6 inhibition data are used to illustrate and validate the method. The distributions of predictions and errors for the training pool accurately predicted the distribution of predictions and errors for large external validation sets, even when the number of positive and negative examples in the training pool were not balanced. Moreover, the likelihood of a given compound being prospectively misclassified as a function of the degree of consensus between networks in the ensemble could in most cases be estimated accurately from the fitted beta binomial distributions for the training pool.ConclusionsConfidence in an individual predictive classification by an ensemble model can be accurately assessed by examining the distributions of predictions and errors as a function of the degree of agreement among the constituent submodels. Further, ensemble uncertainty estimation can often be improved by adjusting the voting or classification threshold based on the parameters of the error distribution. Finally, the profiles for models whose predictive uncertainty estimates are not reliable provide clues to that effect without the need for comparison to an external test set.

[1]  Bernd Beck,et al.  QM/NN QSPR Models with Error Estimation: Vapor Pressure and LogP , 2000, J. Chem. Inf. Comput. Sci..

[2]  Igor V. Tetko,et al.  Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set , 2010, J. Chem. Inf. Model..

[3]  Ruili Huang,et al.  Comprehensive Characterization of Cytochrome P450 Isozyme Selectivity across Chemical Libraries , 2009, Nature Biotechnology.

[4]  Andrew P. Worth,et al.  The Role of Qsar Methodology in the Regulatory Assessment of Chemicals , 2010 .

[5]  Ruili Huang,et al.  Prediction of Cytochrome P450 Profiles of Environmental Chemicals with QSAR Models Built from Drug‐Like Molecules , 2012, Molecular informatics.

[6]  K-R Müller,et al.  A benchmark data set for in silico prediction of ames mutagenicity , 2009 .

[7]  Igor V. Tetko,et al.  Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection , 2008, J. Chem. Inf. Model..

[8]  Rodolfo. Pinal-Calvillo Estimation of aqueous solubility of organic compounds. , 1988 .

[9]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[10]  Nicholas Bodor,et al.  Neural network studies. 1. Estimation of the aqueous solubility of organic compounds , 1991 .

[11]  K. S. Kölbig,et al.  Errata: Milton Abramowitz and Irene A. Stegun, editors, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, National Bureau of Standards, Applied Mathematics Series, No. 55, U.S. Government Printing Office, Washington, D.C., 1994, and all known reprints , 1972 .

[12]  U Sahlin,et al.  Applicability Domain Dependent Predictive Uncertainty in QSAR Regressions , 2014, Molecular informatics.

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[15]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[16]  A. Giuliani,et al.  Computer-assisted analysis of interlaboratory Ames test variability. , 1988, Journal of toxicology and environmental health.

[17]  I. Tetko,et al.  Applicability domain for in silico models to achieve accuracy of experimental measurements , 2010 .

[18]  Johann Gasteiger,et al.  Prediction of Aqueous Solubility of Organic Compounds Based on a 3D Structure Representation , 2003, J. Chem. Inf. Comput. Sci..

[19]  Robert P. Sheridan,et al.  Using Random Forest To Model the Domain Applicability of Another Random Forest Model , 2013, J. Chem. Inf. Model..

[20]  S. H. Yalkowsky Estimation of the Aqueous Solubility of Organic Compounds , 1988 .

[21]  R. Mannhold,et al.  Calculation of molecular lipophilicity: state of the art and comparison of methods on more than 96000 compounds , 2009, Journal of pharmaceutical sciences.

[22]  Lars Carlsson,et al.  QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality , 2013, Journal of Computer-Aided Molecular Design.

[23]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[24]  Klaus-Robert Müller,et al.  Benchmark Data Set for in Silico Prediction of Ames Mutagenicity , 2009, J. Chem. Inf. Model..

[25]  E. Dávila,et al.  A Statistical Model for Analyzing Interdependent Complex of Plant Pathogens , 2012 .

[26]  Gregory W. Kauffman,et al.  Interpretable, Probability-Based Confidence Metric for Continuous Quantitative Structure-Activity Relationship Models , 2013, J. Chem. Inf. Model..

[27]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[28]  Robert D. Clark,et al.  DPRESS: Localizing estimates of predictive uncertainty , 2009, J. Cheminformatics.

[29]  Shane Weaver,et al.  The importance of the domain of applicability in QSAR modeling. , 2008, Journal of molecular graphics & modelling.

[30]  Alexander J Sutton,et al.  What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. , 2004, Statistics in medicine.

[31]  Daniel C. Harris,et al.  Nonlinear Least Squares Curve Fitting with Microsoft Excel Solver , 1998 .

[32]  Ullrika Sahlin,et al.  A Risk Assessment Perspective of Current Practice in Characterizing Uncertainties in QSAR Regression Predictions , 2011, Molecular informatics.

[33]  Weida Tong,et al.  Assessment of Prediction Confidence and Domain Extrapolation of Two Structure–Activity Relationship Models for Predicting Estrogen Receptor Binding Activity , 2004, Environmental health perspectives.

[34]  Paola Gramatica,et al.  Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. , 2003, Environmental health perspectives.

[35]  Ullrika Sahlin,et al.  Uncertainty in QSAR Predictions , 2013, Alternatives to laboratory animals : ATLA.

[36]  James J. Chen,et al.  Ensemble methods for classification of patients for personalized medicine with high-dimensional data , 2007, Artif. Intell. Medicine.

[37]  J. Lindsey,et al.  Response Surfaces for Overdispersion in the Study of the Conditions for Fish Eggs Hatching , 1999, Biometrics.