Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determination

Conformal prediction is introduced as an alternative approach to domain applicability estimation. The advantages of using conformal prediction are as follows: First, the approach is based on a consistent and well-defined mathematical framework. Second, the understanding of the confidence level concept in conformal predictions is straightforward, e.g. a confidence level of 0.8 means that the conformal predictor will commit, at most, 20% errors (i.e., true values outside the assigned prediction range). Third, the confidence level can be varied depending on the situation where the model is to be applied and the consequences of such changes are readily understandable, i.e. prediction ranges are increased or decreased, and the changes can immediately be inspected. We demonstrate the usefulness of conformal prediction by applying it to 10 publicly available data sets.

[1]  Horvath Dragos,et al.  Predicting the predictability: a unified approach to the applicability domain problem of QSAR models. , 2009, Journal of chemical information and modeling.

[2]  Jonathan D. Hirst,et al.  Contemporary QSAR Classifiers Compared , 2007, J. Chem. Inf. Model..

[3]  Gergana Dimitrova,et al.  A Stepwise Approach for Defining the Applicability Domain of SAR and QSAR Models , 2005, J. Chem. Inf. Model..

[4]  Lars Carlsson,et al.  QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality , 2013, Journal of Computer-Aided Molecular Design.

[5]  Gregory W. Kauffman,et al.  Interpretable, Probability-Based Confidence Metric for Continuous Quantitative Structure-Activity Relationship Models , 2013, J. Chem. Inf. Model..

[6]  Robert D. Carr,et al.  The Signature Molecular Descriptor. 4. Canonizing Molecules Using Extended Valence Sequences , 2004, J. Chem. Inf. Model..

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Michael J. Sorich,et al.  Comparison Data Sets for Benchmarking QSAR Methodologies in Lead Optimization , 2009, J. Chem. Inf. Model..

[9]  Robert P. Sheridan,et al.  Three Useful Dimensions for Domain Applicability in QSAR Models Using Random Forest , 2012, J. Chem. Inf. Model..

[10]  Robert D. Clark,et al.  DPRESS: Localizing estimates of predictive uncertainty , 2009, J. Cheminformatics.

[11]  Roberto Todeschini,et al.  Comparison of Different Approaches to Define the Applicability Domain of QSAR Models , 2012, Molecules.

[12]  Shane Weaver,et al.  The importance of the domain of applicability in QSAR modeling. , 2008, Journal of molecular graphics & modelling.

[13]  Igor V. Tetko,et al.  Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set , 2010, J. Chem. Inf. Model..

[14]  Igor Kononenko,et al.  Comparison of approaches for estimating reliability of individual regression predictions , 2008, Data Knowl. Eng..

[15]  Haris Haralambous,et al.  Reliable prediction intervals with regression neural networks , 2011, Neural Networks.

[16]  Andrew P. Worth,et al.  Computational Tools for Regulatory Needs , 2006 .

[17]  Robert P. Sheridan,et al.  Using Random Forest To Model the Domain Applicability of Another Random Forest Model , 2013, J. Chem. Inf. Model..

[18]  Lars Carlsson,et al.  Beyond the Scope of Free-Wilson Analysis: Building Interpretable QSAR Models with Machine Learning Algorithms , 2013, J. Chem. Inf. Model..

[19]  Scott Boyer,et al.  The application of conformal prediction to the drug discovery process , 2013, Annals of Mathematics and Artificial Intelligence.

[20]  Klaus-Robert Müller,et al.  Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules , 2007, J. Comput. Aided Mol. Des..