Introduction of rm2(rank) metric incorporating rank-order predictions as an additional tool for validation of QSAR/QSPR models

In silico techniques involving the development of quantitative regression models have been extensively used for prediction of activity, property and toxicity of new chemicals. The acceptability and subsequent applicability of the models for predictions is determined based on several internal and external validation statistics. Among different validation metrics, Q2 and R2pred represent the classical metrics for internal validation and external validation respectively. Additionally, the rm2 metrics introduced by Roy and coworkers have been widely used by several groups of authors to ensure the close agreement of the predicted response data with the observed ones. However, none of the currently available and commonly used validation metrics provides any information regarding the rank-order predictions for the test set. Thus, to incorporate the concept of ranking order predictions while calculating the common validation metrics originally using the Pearson's correlation coefficient-based algorithm, the new rm2(rank) metric has been introduced in this work as a new variant of the rm2 series of metrics. The ability of this new metric to perform the rank-order prediction is determined based on its application in judging the quality of predictions of regression — based quantitative structure–activity/property relationship (QSAR/QSPR) models for four different data sets. The different validation metrics calculated in each case were compared for their ability to reflect the rank-order predictions based on their correlation with the conventional Spearman's rank correlation coefficient. Based on the results of the sum of ranking differences analysis performed using the Spearman's rank correlation coefficient as the reference, it was observed that the rm2(rank) metric exhibited the least difference in ranking from that of the reference metric. Thus, the close correlation of the rm2(rank) metric with the Spearman's rank correlation coefficient inferred that the new metric could aptly perform the rank-order prediction for the test data set and can be utilized as an additional validation tool, besides the conventional metrics, for assessing the acceptability and predictive ability of a QSAR/QSPR model.

[1]  E. Benfenati,et al.  QSPR modeling bioconcentration factor (BCF) by balance of correlations. , 2009, European journal of medicinal chemistry.

[2]  Nicholas Bodor,et al.  Computer-aided drug design: the role of quantitative structure-property, structure-activity and structure-metabolism relationships (QSPR, QSAR, QSMR) , 2002 .

[3]  Jiwei Hu,et al.  A Review on Progress in QSPR Studies for Surfactants , 2010, International journal of molecular sciences.

[4]  K. Roy,et al.  Further exploring rm2 metrics for validation of QSPR models , 2011 .

[5]  Humayun Kabir,et al.  Comparative Studies on Some Metrics for External Validation of QSPR Models , 2012, J. Chem. Inf. Model..

[6]  Kunal Roy,et al.  On some aspects of validation of predictive quantitative structure–activity relationship models , 2007, Expert opinion on drug discovery.

[7]  Mohammad Goodarzi,et al.  QSPR Modeling of Bioconcentration Factors of Nonionic Organic Compounds , 2010, Environmental health insights.

[8]  Huabei Zhang,et al.  Combined 3D-QSAR modeling and molecular docking study on 1,4-dihydroindeno[1,2-c]pyrazoles as VEGFR-2 kinase inhibitors. , 2010, Journal of molecular graphics & modelling.

[9]  Brian Everitt,et al.  Cluster analysis , 1974 .

[10]  S. J. Devlin,et al.  Robust estimation and outlier detection with correlation coefficients , 1975 .

[11]  Curt M. Breneman,et al.  Rank Order Entropy: Why One Metric Is Not Enough , 2011, J. Chem. Inf. Model..

[12]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[13]  Ralph Kühne,et al.  External Validation and Prediction Employing the Predictive Squared Correlation Coefficient Test Set Activity Mean vs Training Set Activity Mean , 2008, J. Chem. Inf. Model..

[14]  P. Roy,et al.  Exploring the impact of size of training sets for the development of predictive QSAR models , 2008 .

[15]  Chien-Yu Chen,et al.  Current developments of computer-aided drug design , 2010 .

[16]  Ruisheng Zhang,et al.  CoMFA and CoMSIA 3D-QSAR studies on quionolone caroxylic acid derivatives inhibitors of HIV-1 integrase. , 2010, European journal of medicinal chemistry.

[17]  Anton J. Hopfinger,et al.  Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[18]  T Scior,et al.  How to recognize and workaround pitfalls in QSAR studies: a critical review. , 2009, Current medicinal chemistry.

[19]  Kunal Roy,et al.  Predictive toxicology using QSAR: A perspective , 2010 .

[20]  K. Héberger Sum of ranking differences compares methods or models fairly , 2010 .

[21]  D. Sprous,et al.  QSAR in the pharmaceutical research setting: QSAR models for broad, large problems. , 2010, Current topics in medicinal chemistry.

[22]  Norman Cliff,et al.  Empirical Size, Coverage, and Power of Confidence Intervals for Spearman's Rho , 1997 .

[23]  Davide Ballabio,et al.  Evaluation of model predictive ability by external validation techniques , 2010 .

[24]  Kunal Roy,et al.  Exploring 2D and 3D QSARs of 2,4-diphenyl-1,3-oxazolines for ovicidal activity against Tetranychus urticae , 2009 .

[25]  Eduardo A. Castro,et al.  QSPR Studies on Aqueous Solubilities of Drug-Like Compounds , 2009, International journal of molecular sciences.

[26]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[27]  J. Dearden,et al.  How not to develop a quantitative structure–activity or structure–property relationship (QSAR/QSPR) , 2009, SAR and QSAR in environmental research.

[28]  G. W. Snedecor Statistical Methods , 1964 .

[29]  K. Héberger,et al.  Sum of ranking differences for method discrimination and its validation: comparison of ranks with random numbers , 2011 .