Beware of q 2

Validation is a crucial aspect of any quantitative structure–activity relationship (QSAR) modeling. This paper examines one of the most popular validation criteria, leave-one-out cross-validated R2 (LOO q2). Often, a high value of this statistical characteristic ( q2 > 0.5) is considered as a proof of the high predictive ability of the model. In this paper, we show that this assumption is generally incorrect. In the case of 3D QSAR, the lack of the correlation between the high LOO q2 and the high predictive ability of a QSAR model has been established earlier [Pharm. Acta Helv. 70 (1995) 149; J. Chemomet. 10 (1996) 95; J. Med. Chem. 41 (1998) 2553]. In this paper, we use two-dimensional (2D) molecular descriptors and k nearest neighbors ( kNN) QSAR method for the analysis of several datasets. No correlation between the values ofq2 for the training set and predictive ability for the test set was found for any of the datasets. Thus, the high value of LOO q2 appears to be the necessary but not the sufficient condition for the model to have a high predictive power. We argue that this is the general property of QSAR models developed using LOO cross-validation. We emphasize that the external validation is the only way to establish a reliable QSAR model. We formulate a set of criteria for evaluation of predictive ability of QSAR models. © 2002 Elsevier Science Inc. All rights reserved.

[1]  H. Wiener Structural determination of paraffin boiling points. , 1947, Journal of the American Chemical Society.

[2]  J. Platt Prediction of Isomeric Differences in Paraffin Properties , 1952 .

[3]  D. A. Bell,et al.  Applied Statistics , 1953, Nature.

[4]  Frank Harary,et al.  Graph Theory , 2016 .

[5]  M. Randic Characterization of molecular branching , 1975 .

[6]  L. Hall,et al.  Molecular connectivity in chemistry and drug research , 1976 .

[7]  G. Flynn Substituent Constants for Correlation Analysis in Chemistry and Biology. , 1980 .

[8]  Nenad Trinajstić,et al.  Isomer discrimination by topological information approach , 1981 .

[9]  Lemont B. Kier,et al.  A Shape Index from Molecular Graphs , 1985 .

[10]  I. W Nowell,et al.  Molecular Connectivity in Structure-Activity Analysis , 1986 .

[11]  L. Kier Inclusion of Symmetry as a Shape Attribute in Kappa Index Analysis , 1987 .

[12]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[13]  R. Cramer,et al.  Recent advances in comparative molecular field analysis (CoMFA). , 1989, Progress in clinical and biological research.

[14]  Lemont B. Kier,et al.  Determination of Topological Equivalence in Molecular Graphs from the Topological State , 1990 .

[15]  Lemont B. Kier,et al.  A Differential Molecular Connectivity Index , 1991 .

[16]  Lemont B. Kier,et al.  The Electrotopological State: An Atom Index for QSAR , 1991 .

[17]  Lemont B. Kier,et al.  The electrotopological state: structure information at the atomic level for molecular graphs , 1991, J. Chem. Inf. Comput. Sci..

[18]  Michel Petitjean,et al.  Applications of the radius-diameter diagram to the classification of topological and geometrical shapes of chemical compounds , 1992, J. Chem. Inf. Comput. Sci..

[19]  Gerta Rücker,et al.  Counts of all walks as atomic and molecular descriptors , 1993, J. Chem. Inf. Comput. Sci..

[20]  Gabriele Cruciani,et al.  Experimental Design in Synthesis Planning and Structure‐Property Correlations , 1995 .

[21]  S. Wold,et al.  Statistical Validation of QSAR Results , 1995 .

[22]  Ettore Novellino,et al.  Use of comparative molecular field analysis and cluster analysis in series design , 1995 .

[23]  Ulf Norinder,et al.  Single and domain mode variable selection in 3D QSAR applications , 1996 .

[24]  Danail Bonchev,et al.  Novel Indices for the Topological Complexity of Molecules , 1997 .

[25]  Eugene A. Coats,et al.  The CoMFA Steroids as a Benchmark Dataset for Development of 3D QSAR Methods , 1998 .

[26]  H. Kubinyi,et al.  Three-dimensional quantitative similarity-activity relationships (3D QSiAR) from SEAL similarity matrices. , 1998, Journal of medicinal chemistry.

[27]  Sung Jin Cho,et al.  Rational Combinatorial Library Design. 2. Rational Design of Targeted Combinatorial Peptide Libraries Using Chemical Similarity Probe and the Inverse QSAR Approaches , 1998, J. Chem. Inf. Comput. Sci..

[28]  Sung Jin Cho,et al.  Rational Combinatorial Library Design. 1. Focus-2D: A New Approach to the Design of Targeted Combinatorial Chemical Libraries , 1998, J. Chem. Inf. Comput. Sci..

[29]  L. Hall,et al.  Molecular Structure Description: The Electrotopological State , 1999 .

[30]  S. J. Cho,et al.  "New tricks for an old dog": Development and application of novel QSAR methods for rational design of combinatorial chemical libraries and database mining , 1999 .

[31]  S. Wyrick,et al.  Synthesis, evaluation, and comparative molecular field analysis of 1-phenyl-3-amino-1,2,3,4-tetrahydronaphthalenes as ligands for histamine H(1) receptors. , 1999, Journal of medicinal chemistry.

[32]  D. E. Nichols,et al.  Quantitative structure-activity relationship modeling of dopamine D(1) antagonists using comparative molecular field analysis, genetic algorithms-partial least-squares, and K nearest neighbor methods. , 1999, Journal of medicinal chemistry.

[33]  Robert E. Hormann,et al.  An extensive ecdysteroid CoMFA , 1999, J. Comput. Aided Mol. Des..

[34]  A. Cavalli,et al.  SAR of 9-amino-1,2,3,4-tetrahydroacridine-based acetylcholinesterase inhibitors: synthesis, enzyme inhibitory activity, QSAR, and structure-based CoMFA of tacrine analogues. , 2000, Journal of medicinal chemistry.

[35]  Alexander Tropsha,et al.  Novel Variable Selection Quantitative Structure-Property Relationship Approach Based on the k-Nearest-Neighbor Principle , 2000, J. Chem. Inf. Comput. Sci..

[36]  L. Pardo,et al.  Molecular determinants of MAO selectivity in a series of indolylmethylamine derivatives: biological activities, 3D-QSAR/CoMFA analysis, and computational simulation of ligand recognition. , 2000, Journal of medicinal chemistry.

[37]  Alexander Golbraikh,et al.  Novel Chirality Descriptors Derived from Molecular Topology. , 2001 .