Assessment of Machine Learning Reliability Methods for Quantifying the Applicability Domain of QSAR Regression Models

The vastness of chemical space and the relatively small coverage by experimental data recording molecular properties require us to identify subspaces, or domains, for which we can confidently apply QSAR models. The prediction of QSAR models in these domains is reliable, and potential subsequent investigations of such compounds would find that the predictions closely match the experimental values. Standard approaches in QSAR assume that predictions are more reliable for compounds that are "similar" to those in subspaces with denser experimental data. Here, we report on a study of an alternative set of techniques recently proposed in the machine learning community. These methods quantify prediction confidence through estimation of the prediction error at the point of interest. Our study includes 20 public QSAR data sets with continuous response and assesses the quality of 10 reliability scoring methods by observing their correlation with prediction error. We show that these new alternative approaches can outperform standard reliability scores that rely only on similarity to compounds in the training set. The results also indicate that the quality of reliability scoring methods is sensitive to data set characteristics and to the regression method used in QSAR. We demonstrate that at the cost of increased computational complexity these dependencies can be leveraged by integration of scores from various reliability estimation approaches. The reliability estimation techniques described in this paper have been implemented in an open source add-on package ( https://bitbucket.org/biolab/orange-reliability ) to the Orange data mining suite.

[1]  Michael J. Sorich,et al.  Comparison Data Sets for Benchmarking QSAR Methodologies in Lead Optimization , 2009, J. Chem. Inf. Model..

[2]  Robert P. Sheridan,et al.  Three Useful Dimensions for Domain Applicability in QSAR Models Using Random Forest , 2012, J. Chem. Inf. Model..

[3]  Igor Kononenko,et al.  Comparison of approaches for estimating reliability of individual regression predictions , 2008, Data Knowl. Eng..

[4]  Radu Herbei,et al.  Classification with reject option , 2006 .

[5]  Pierre Bruneau,et al.  logD7.4 Modeling Using Bayesian Regularized Neural Networks. Assessment and Correction of the Errors of Prediction , 2006, J. Chem. Inf. Model..

[6]  Robert P. Sheridan,et al.  Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR , 2004, J. Chem. Inf. Model..

[7]  Lars Carlsson,et al.  QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality , 2013, Journal of Computer-Aided Molecular Design.

[8]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[9]  Robert D. Clark,et al.  DPRESS: Localizing estimates of predictive uncertainty , 2009, J. Cheminformatics.

[10]  Shane Weaver,et al.  The importance of the domain of applicability in QSAR modeling. , 2008, Journal of molecular graphics & modelling.

[11]  Igor Kononenko,et al.  Estimation of individual prediction reliability using the local sensitivity analysis , 2008, Applied Intelligence.

[12]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[13]  Scott Boyer,et al.  The application of conformal prediction to the drug discovery process , 2013, Annals of Mathematics and Artificial Intelligence.

[14]  Martin Mozina,et al.  Orange: data mining toolbox in python , 2013, J. Mach. Learn. Res..

[15]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[16]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[17]  Igor Kononenko,et al.  Automatic selection of reliability estimates for individual regression predictions , 2010, The Knowledge Engineering Review.

[18]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[19]  Nina Nikolova-Jeliazkova,et al.  QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review , 2005, Alternatives to laboratory animals : ATLA.

[20]  J. Jaworska,et al.  Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. , 2003, Environmental health perspectives.

[21]  Ran El-Yaniv,et al.  Pointwise Tracking the Optimal Regression Function , 2012, NIPS.

[22]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[23]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[24]  William L. Jorgensen,et al.  Journal of Chemical Information and Modeling , 2005, J. Chem. Inf. Model..

[25]  Ullrika Sahlin,et al.  A Risk Assessment Perspective of Current Practice in Characterizing Uncertainties in QSAR Regression Predictions , 2011, Molecular informatics.