Comparative Studies on Some Metrics for External Validation of QSPR Models

Quantitative structure-property relationship (QSPR) models used for prediction of property of untested chemicals can be utilized for prioritization plan of synthesis and experimental testing of new compounds. Validation of QSPR models plays a crucial role for judgment of the reliability of predictions of such models. In the QSPR literature, serious attention is now given to external validation for checking reliability of QSPR models, and predictive quality is in the most cases judged based on the quality of predictions of property of a single test set as reflected in one or more external validation metrics. Here, we have shown that a single QSPR model may show a variable degree of prediction quality as reflected in some variants of external validation metrics like Q²(F1), Q²(F2), Q²(F3), CCC, and r²(m) (all of which are differently modified forms of predicted variance, which theoretically may attain a maximum value of 1), depending on the test set composition and test set size. Thus, this report questions the appropriateness of the common practice of the "classic" approach of external validation based on a single test set and thereby derives a conclusion about predictive quality of a model on the basis of a particular validation metric. The present work further demonstrates that among the considered external validation metrics, r²(m) shows statistically significantly different numerical values from others among which CCC is the most optimistic or less stringent. Furthermore, at a given level of threshold value of acceptance for external validation metrics, r²(m) provides the most stringent criterion (especially with Δr²(m) at highest tolerated value of 0.2) of external validation, which may be adopted in the case of regulatory decision support processes.

[1]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[2]  Ralph Kühne,et al.  External Validation and Prediction Employing the Predictive Squared Correlation Coefficient Test Set Activity Mean vs Training Set Activity Mean , 2008, J. Chem. Inf. Model..

[3]  Kuo-Chen Chou,et al.  Heuristic molecular lipophilicity potential (HMLP): A 2D‐QSAR study to LADH of molecular family pyrazole and derivatives , 2005, J. Comput. Chem..

[4]  M. Pavan,et al.  The role of the European Chemicals Bureau in promoting the regulatory use of (Q)SAR methods , 2007, SAR and QSAR in environmental research.

[5]  R. Todeschini,et al.  Molecular Descriptors for Chemoinformatics: Volume I: Alphabetical Listing / Volume II: Appendices, References , 2009 .

[6]  L. Lin Assay Validation Using the Concordance Correlation Coefficient , 1992 .

[7]  Davide Ballabio,et al.  Evaluation of model predictive ability by external validation techniques , 2010 .

[8]  Douglas M. Hawkins,et al.  Assessing Model Fit by Cross-Validation , 2003, J. Chem. Inf. Comput. Sci..

[9]  Paola Gramatica,et al.  Introduction General Considerations , 2022 .

[10]  A. Tropsha,et al.  Beware of q 2 , 2002 .

[11]  Kuo-Chen Chou,et al.  Multiple field three dimensional quantitative structure–activity relationship (MF‐3D‐QSAR) , 2008, J. Comput. Chem..

[12]  K. Roy,et al.  Further exploring rm2 metrics for validation of QSPR models , 2011 .

[13]  R. Darlington,et al.  Regression and Linear Models , 1990 .

[14]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[15]  Paola Gramatica,et al.  Prediction of the adsorption capability onto activated carbon of a large data set of chemicals by local lazy regression method , 2010 .

[16]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[17]  Kunal Roy,et al.  On some aspects of validation of predictive quantitative structure–activity relationship models , 2007, Expert opinion on drug discovery.

[18]  Paola Gramatica,et al.  Real External Predictivity of QSAR Models: How To Evaluate It? Comparison of Different Validation Criteria and Proposal of Using the Concordance Correlation Coefficient , 2011, J. Chem. Inf. Model..

[19]  Maykel Pérez González,et al.  Applications of 2D descriptors in drug design: a DRAGON tale. , 2008, Current topics in medicinal chemistry.

[20]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[21]  S. Wold Validation of QSAR's , 1991 .

[22]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[23]  Alexander Golbraikh,et al.  Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection , 2002, J. Comput. Aided Mol. Des..

[24]  Maykel Pérez González,et al.  Variable selection methods in QSAR: an overview. , 2008, Current topics in medicinal chemistry.

[25]  J. Dearden,et al.  How not to develop a quantitative structure–activity or structure–property relationship (QSAR/QSPR) , 2009, SAR and QSAR in environmental research.

[26]  Xiaohui Fan,et al.  Why QSAR fails: an empirical evaluation using conventional computational approach. , 2011, Molecular pharmaceutics.

[27]  Alfonso R. Gennaro,et al.  Remington:the science and practice of pharmacy , 1995 .