External Evaluation of QSAR Models, in Addition to Cross‐Validation: Verification of Predictive Capability on Totally New Chemicals

Dear Editors, an interesting paper of G tlein et al. , recently published in your journal, has reopened the debate on the crucial topic of QSAR model validation, which, over the past decade, has been the subject of wide discussions in scientific and regulatory communities. Many notable scientific papers have been published (I cite here only a few of the most pertinent) with different underlying ideas on the “best” way to validate QSAR models using various methodological approaches: a) only by cross-validation (CV), simple or double CV, b) by an additional external validation, 10–17] (better if verified, in my opinion, by different statistical parameters), after the necessary preliminary internal validation by CV. The common final aim is to propose good QSAR models that are not only statistically robust, but also with a verified high predictive capability. The discrepancy in these two approaches lies in this point: how to verify the predictive performance of a QSAR model when applied to completely new chemicals. In the Introduction to their paper G tlein et al. wrote: “Many (Q)SAR researchers consider validation with a single external test set as the “gold standard” to assess model performance and they question the reliability of cross-validation procedures”. In my opinion, this point is not commented on clearly, at least in reference to my cited work, so I wish to clarify my validation approach in order to highlight and resolve some misunderstandings. First of all, I am sure that all good QSAR modellers cannot disagree that CV (not simply by LOO, but also by LMO and/or bootstrap) is a necessary preliminary step in any QSAR validation, and it is unquestionably the best way to validate each model for its statistical performance in terms of the robustness and predictivity of partial sub-models on chemicals that have been iteratively put aside (hold-out) in the test sets. According to some authors, including me, this should be defined as the internal validation, because at the end of the complete modelling process the molecular structure of all the chemicals has been seen within the validation procedure, and their structural information has contributed to the molecular descriptor selection, at least in one run of CV when they were iteratively put in the training sub-set. Therefore, they are not really external (completely new) to the final model. Indeed, internal validation parameters for proposed QSAR models must always be reported in publications to guarantee model robustness. Moreover, in QSAR modelling, it is important to distinguish an approach proposing predicted data from a specific single model (easily reproduced by any user) from an approach that produces predicted data obtained by averaging the results from multiple models, and therefore by a more complex algorithm. In my research I always apply the first approach, while the work discussed by G tlein et al. in their paper uses the second one. The reason to prefer a single model, which is a unique specific regression equation based on a few selected descriptors with their relative coefficients, is mainly related to the preference that the “unambiguous algorithm”, (requested by the second Principles of the famous “OECD Principles for validation of QSAR models and applicability in regulation” ) would be the simplest and most easily reproducible, and therefore easily applicable by a wide number of users, including regulators in the new European legislation on chemicals REACH. According to Principle 4, discussed in depth in my previous paper and in the Guidance Documents of the OECD Principles, the model must be verified for its goodness of fit (by R), robustness (by internal Cross-Validation: QLOO and QLMO) and external predictivity (on external set compounds, which did not take part in the model development). Also in the Guidance document there is a clear distinction between internal and external validation in this sense. Only models with good internal validation parameters that guarantee their robustness should be chosen from among all the single models obtained by using the Genetic Algorithm (GA) as method for descriptor selection in Ordinary Least Square (OLS) regression (my QSAR approach, as implemented in my in-house software QSARINS). However, my personal experience (and not only mine) is that some QSAR models show good performance when verified

[1]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[2]  Knut Baumann,et al.  Validation tools for variable subset regression , 2004, J. Comput. Aided Mol. Des..

[3]  Paola Gramatica,et al.  QSARINS: A new software for the development, analysis, and validation of QSAR MLR models , 2013, J. Comput. Chem..

[4]  Knut Baumann,et al.  Cross-validation as the objective function for variable-selection techniques , 2003 .

[5]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[6]  Alexander Golbraikh,et al.  Does Rational Selection of Training and Test Sets Improve the Outcome of QSAR Modeling? , 2012, J. Chem. Inf. Model..

[7]  Paola Gramatica,et al.  QSAR Modeling is not “Push a Button and Find a Correlation”: A Case Study of Toxicity of (Benzo‐)triazoles on Algae , 2012, Molecular informatics.

[8]  H. Kubinyi,et al.  Three-dimensional quantitative similarity-activity relationships (3D QSiAR) from SEAL similarity matrices. , 1998, Journal of medicinal chemistry.

[9]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[10]  Paola Gramatica,et al.  Real External Predictivity of QSAR Models. Part 2. New Intercomparable Thresholds for Different Validation Criteria and the Need for Scatter Plot Inspection , 2012, J. Chem. Inf. Model..

[11]  Hugo Kubinyi,et al.  From Narcosis to Hyperspace: The History of QSAR , 2002 .

[12]  Paola Gramatica,et al.  Real External Predictivity of QSAR Models: How To Evaluate It? Comparison of Different Validation Criteria and Proposal of Using the Concordance Correlation Coefficient , 2011, J. Chem. Inf. Model..

[13]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[14]  Alexander Golbraikh,et al.  Predictive QSAR modeling workflow, model applicability domains, and virtual screening. , 2007, Current pharmaceutical design.

[15]  Stefan Kramer,et al.  A Large‐Scale Empirical Evaluation of Cross‐Validation and External Test Set Validation in (Q)SAR , 2013, Molecular informatics.

[16]  Douglas M. Hawkins,et al.  Assessing Model Fit by Cross-Validation , 2003, J. Chem. Inf. Comput. Sci..

[17]  Paul Geladi,et al.  Principles of Proper Validation: use and abuse of re‐sampling for validation , 2010 .

[18]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .