Accuracy (of Prediction)

The non-specialist might have assumed that the main objective of a QSAR study is to predict whether an untested compound will be active or inactive (or to do virtual screening, i.e., predictions about a whole virtual library of compounds). In practice, much work has been devoted to “explanatory” QSAR, relating changes in molecular structure to changes in activity, and only recently has there been considerable interest in predictivity; QSAR is now being used for virtual screening, to find biologically active molecules. There are many reasons why models fail [1,2]: bad data, bad methodology, inappropriate descriptors, domain inapplicability [3], etc. In this article we can address only a few of the issues. Vendors are supplying models that may or may not be applicable to a corporate virtual library [2] and many (in-house approved) models are now available to non-experts on corporate intranets. How are these users to judge applicability?

[1]  William L. Jorgensen,et al.  QSAR/QSPR and Proprietary Data , 2006, Journal of Chemical Information and Modeling.

[2]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[3]  Arthur M. Doweyko,et al.  3D-QSAR illusions , 2004, J. Comput. Aided Mol. Des..

[4]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[5]  Alexander Golbraikh,et al.  Rational selection of training and test sets for the development of validated QSAR models , 2003, J. Comput. Aided Mol. Des..

[6]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[7]  A. Tropsha,et al.  Beware of q 2 , 2002 .

[8]  Hugo Kubinyi,et al.  Validation and Predictivity of QSAR Models , 2004 .

[9]  J. Irwin,et al.  Benchmarking sets for molecular docking. , 2006, Journal of medicinal chemistry.

[10]  H. Kubinyi,et al.  Three-dimensional quantitative similarity-activity relationships (3D QSiAR) from SEAL similarity matrices. , 1998, Journal of medicinal chemistry.

[11]  Robert P. Sheridan,et al.  Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR , 2004, J. Chem. Inf. Model..

[12]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[13]  Yi Li,et al.  In silico ADME/Tox: why models fail , 2003, J. Comput. Aided Mol. Des..