How to Judge Predictive Quality of Classification and Regression Based QSAR Models

Abstract: Quantitative structure-activity relationship (QSAR) is a statistical modelling approach that can be used in drug discovery, environmental fate modeling, property and activity prediction of new, untested compounds. Validation has been identified as one of the important steps for checking the robustness and reliability of QSAR models. Various methodological aspects of validation of QSARs have been a subject of strong debate within the academic and regulatory communities. One of the principles (Principle 4) of the Organization for Economic Cooperation and Development (OECD) refers to the need to establish “appropriate measures of goodness-of-fit, robustness and predictivity” for any QSAR model. Validation strategies are recognized decisive steps to check the statistical acceptability and applicability of the constructed models on a new set of data in order to judge the confidence of predictions. Validation is a holistic practice that comprises evaluation of issues such as quality of data, applicability of the model for prediction purpose and mechanistic interpretation in addition to statistical judgment. Validation strategies are largely dependent on various validation metrics. Viewing the importance of QSAR validation approaches and different validation parameters in the development of successful and acceptable QSAR models, we herein focus to have an overview of different traditional as well as relatively new validation metrics used to judge the quality of the regression as well as classification based QSAR models.

[1]  Junmei Wang,et al.  Applications of genetic algorithms on the structure–activity correlation study of a group of non-nucleoside HIV-1 inhibitors , 1999 .

[2]  R. O’Brien,et al.  A Caution Regarding Rules of Thumb for Variance Inflation Factors , 2007 .

[3]  P. Popelier,et al.  Predictive QSPR modeling of the acidic dissociation constant (pKa) of phenols in different solvents , 2009 .

[4]  Douglas M. Hawkins,et al.  Assessing Model Fit by Cross-Validation , 2003, J. Chem. Inf. Comput. Sci..

[5]  K. Roy,et al.  Molecular Shape Analysis of Antioxidant and Squalene Synthase Inhibitory Activities of Aromatic Tetrahydro‐1,4‐oxazine Derivatives , 2009, Chemical biology & drug design.

[6]  K. Roy,et al.  Exploring quantitative structure–activity relationship studies of antioxidant phenolic compounds obtained from traditional Chinese medicinal plants , 2010 .

[7]  K. Roy,et al.  First report on predictive chemometric modeling, 3D-toxicophore mapping and in silico screening of in vitro basal cytotoxicity of diverse organic chemicals. , 2013, Toxicology in vitro : an international journal published in association with BIBRA.

[8]  Kunal Roy,et al.  Exploring Predictive QSAR Models Using Quantum Topological Molecular Similarity (QTMS) Descriptors for Toxicity of Nitroaromatics to Saccharomyces cerevisiae , 2008 .

[9]  Emili Besalú,et al.  Trends and Plot Methods in MLR Studies , 2007, J. Chem. Inf. Model..

[10]  Kunal Roy,et al.  Development of classification and regression based QSAR models to predict rodent carcinogenic potency using oral slope factor. , 2012, Ecotoxicology and environmental safety.

[11]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[12]  Ramón García-Domenech,et al.  Topological virtual screening: a way to find new compounds active in ulcerative colitis by inhibiting NF-κB , 2011, Molecular Diversity.

[13]  Kunal Roy,et al.  Introduction of rm2(rank) metric incorporating rank-order predictions as an additional tool for validation of QSAR/QSPR models , 2012 .

[14]  H. Akaike A new look at the statistical model identification , 1974 .

[15]  Nina Nikolova-Jeliazkova,et al.  An Approach to Determining Applicability Domains for QSAR Group Contribution Models: An Analysis of SRC KOWWIN , 2005, Alternatives to laboratory animals : ATLA.

[16]  George Kollias,et al.  A combined LS-SVM & MLR QSAR workflow for predicting the inhibition of CXCR3 receptor by quinazolinone analogs , 2010, Molecular Diversity.

[17]  J. Doucet,et al.  QSAR models for 2-amino-6-arylsulfonylbenzonitriles and congeners HIV-1 reverse transcriptase inhibitors based on linear and nonlinear regression methods. , 2009, European journal of medicinal chemistry.

[18]  Humberto González-Díaz,et al.  Multi-target spectral moments for QSAR and Complex Networks study of antibacterial drugs. , 2009, European journal of medicinal chemistry.

[19]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[20]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[21]  Shane Weaver,et al.  The importance of the domain of applicability in QSAR modeling. , 2008, Journal of molecular graphics & modelling.

[22]  Feng Luan,et al.  Fragment-based QSAR model toward the selection of versatile anti-sarcoma leads. , 2011, European journal of medicinal chemistry.

[23]  Humayun Kabir,et al.  Comparative Studies on Some Metrics for External Validation of QSPR Models , 2012, J. Chem. Inf. Model..

[24]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[25]  P. Popelier,et al.  QSAR with quantum topological molecular similarity indices: toxicity of aromatic aldehydes to Tetrahymena pyriformis , 2010, SAR and QSAR in environmental research.

[26]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[27]  Davide Ballabio,et al.  Evaluation of model predictive ability by external validation techniques , 2010 .

[28]  Kunal Roy,et al.  First report on development of quantitative interspecies structure-carcinogenicity relationship models and exploring discriminatory features for rodent carcinogenicity of diverse organic chemicals using OECD guidelines. , 2012, Chemosphere.

[29]  M. Tichy̌,et al.  Validation of QSAR models for legislative purposes , 2009, Interdisciplinary toxicology.

[30]  Hassan Golmohammadi,et al.  Prediction of air to liver partition coefficient for volatile organic compounds using QSAR approaches. , 2010, European journal of medicinal chemistry.

[31]  B. Skagerberg,et al.  Comparison of Chemometric Models for QSAR , 1990 .

[32]  M. Natália D. S. Cordeiro,et al.  Two New Parameters Based on Distances in a Receiver Operating Characteristic Chart for the Selection of Classification Models , 2011, J. Chem. Inf. Model..

[33]  Kunal Roy,et al.  Exploring predictive QSAR models for hepatocyte toxicity of phenols using QTMS descriptors. , 2008, Bioorganic & medicinal chemistry letters.

[34]  Boris Mirkin,et al.  A Measure of Domain of Applicability for QSAR Modelling Based on Intelligent K-Means Clustering , 2007 .

[35]  J. Topliss,et al.  Chance correlations in structure-activity studies using multiple regression analysis , 1972 .

[36]  Kunal Roy,et al.  QSAR of antilipid peroxidative activity of substituted benzodioxoles using chemometric tools , 2009, J. Comput. Chem..

[37]  Weida Tong,et al.  Assessment of Prediction Confidence and Domain Extrapolation of Two Structure–Activity Relationship Models for Predicting Estrogen Receptor Binding Activity , 2004, Environmental health perspectives.

[38]  Kunal Roy,et al.  QSAR modeling of toxicity of diverse organic chemicals to Daphnia magna using 2D and 3D descriptors. , 2010, Journal of hazardous materials.

[39]  W. Tong,et al.  Quantitative structure‐activity relationship methods: Perspectives on drug discovery and toxicology , 2003, Environmental toxicology and chemistry.

[40]  Kunal Roy,et al.  Docking and 3D QSAR studies of protoporphyrinogen oxidase inhibitor 3H-pyrazolo[3,4-d][1,2,3]triazin-4-one derivatives , 2010, Journal of molecular modeling.

[41]  S. C. Taneja,et al.  Quantitative structure-activity relationship (QSAR) of aryl alkenyl amides/imines for bacterial efflux pump inhibitors. , 2009, European journal of medicinal chemistry.

[42]  Mark T. D. Cronin,et al.  The better predictive model: High q2 for the training set or low root mean square error of prediction for the test set? , 2005 .

[43]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[44]  Kunal Roy,et al.  On some aspects of validation of predictive quantitative structure–activity relationship models , 2007, Expert opinion on drug discovery.

[45]  Nina Nikolova-Jeliazkova,et al.  QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review , 2005, Alternatives to laboratory animals : ATLA.

[46]  Kunal Roy,et al.  Comparative chemometric modeling of cytochrome 3A4 inhibitory activity of structurally diverse compounds using stepwise MLR, FA-MLR, PLS, GFA, G/PLS and ANN techniques. , 2009, European journal of medicinal chemistry.

[47]  Wei Zhao,et al.  A statistical framework to evaluate virtual screening , 2009, BMC Bioinformatics.

[48]  Rajarshi Guha,et al.  Determining the Validity of a QSAR Model - A Classification Approach , 2005, J. Chem. Inf. Model..

[49]  Roberto Todeschini,et al.  Comments on the Definition of the Q2 Parameter for QSAR Validation , 2009, J. Chem. Inf. Model..

[50]  Jerzy Leszczynski,et al.  InChI-based optimal descriptors: QSAR analysis of fullerene[C60]-based HIV-1 PR inhibitors by correlation balance. , 2010, European journal of medicinal chemistry.

[51]  J. Jaworska,et al.  Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. , 2003, Environmental health perspectives.

[52]  John C Dearden,et al.  Guidelines for developing and using quantitative structure‐activity relationships , 2003, Environmental toxicology and chemistry.

[53]  Ralph Kühne,et al.  External Validation and Prediction Employing the Predictive Squared Correlation Coefficient Test Set Activity Mean vs Training Set Activity Mean , 2008, J. Chem. Inf. Model..

[54]  P. Roy,et al.  Exploring the impact of size of training sets for the development of predictive QSAR models , 2008 .

[55]  Jerzy Leszczynski,et al.  SMILES‐based optimal descriptors: QSAR analysis of fullerene‐based HIV‐1 PR inhibitors by means of balance of correlations , 2009, J. Comput. Chem..

[56]  I. Tikhonova,et al.  Virtual screening of organic molecule databases. Design of focused libraries of potential ligands of NMDA and AMPA receptors , 2004 .

[57]  Kunal Roy,et al.  On further application of r  m2 as a metric for validation of QSAR models , 2009, Journal of Chemometrics.

[58]  Joanna Jaworska,et al.  Improving Opportunities for Regulatory Acceptance of QSARs: The Importance of Model Domain, Uncertainty, Validity and Predictability , 2003 .

[59]  Scott D. Kahn,et al.  Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships , 2005, Alternatives to laboratory animals : ATLA.

[60]  Robert D. Combes,et al.  Practical Aspects of the Validation of Toxicity Test Procedures , 1995 .

[61]  Kunal Roy,et al.  Predictive toxicology using QSAR: A perspective , 2010 .

[62]  Yong Shen,et al.  Binding conformations and QSAR of CA-4 analogs as tubulin inhibitors , 2010, Journal of enzyme inhibition and medicinal chemistry.

[63]  Ruisheng Zhang,et al.  CoMFA and CoMSIA 3D-QSAR studies on quionolone caroxylic acid derivatives inhibitors of HIV-1 integrase. , 2010, European journal of medicinal chemistry.

[64]  Robert P. Sheridan,et al.  Protocols for Bridging the Peptide to Nonpeptide Gap in Topological Similarity Searches , 2001, J. Chem. Inf. Comput. Sci..

[65]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[66]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[67]  Paola Gramatica,et al.  Real External Predictivity of QSAR Models: How To Evaluate It? Comparison of Different Validation Criteria and Proposal of Using the Concordance Correlation Coefficient , 2011, J. Chem. Inf. Model..

[68]  J. Gálvez,et al.  Pharmacological distribution diagrams: a tool for de novo drug design. , 1996, Journal of molecular graphics.

[69]  S. Wold Validation of QSAR's , 1991 .

[70]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[71]  E. Benfenati,et al.  QSPR modeling bioconcentration factor (BCF) by balance of correlations. , 2009, European journal of medicinal chemistry.

[72]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[73]  H. Kubinyi Variable Selection in QSAR Studies. II. A Highly Efficient Combination of Systematic Search and Evolution , 1994 .

[74]  Roberto Todeschini,et al.  Comparison of Different Approaches to Define the Applicability Domain of QSAR Models , 2012, Molecules.

[75]  Robert D. Clark,et al.  Managing bias in ROC curves , 2008, J. Comput. Aided Mol. Des..

[76]  K. Roy,et al.  On Two Novel Parameters for Validation of Predictive QSAR Models , 2009, Molecules.

[77]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[78]  Arijit Basu,et al.  Development of CoMFA and CoMSIA models of cytotoxicity data of anti-HIV-1-phenylamino-1H-imidazole derivatives. , 2009, European journal of medicinal chemistry.

[79]  H. Mewes,et al.  Can we estimate the accuracy of ADME-Tox predictions? , 2006, Drug discovery today.