Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR Modeling.

In the context of human safety assessment through quantitative structure-activity relationship (QSAR) modeling, the concept of applicability domain (AD) has an enormous role to play. The Organization of Economic Co-operation and Development (OECD) for QSAR model validation recommended as principle 3 "A defined domain of applicability" to be present for a predictive QSAR model. The study of AD allows estimating the uncertainty in the prediction for a particular molecule based on how similar it is to the training compounds which are used in the model development. In the current scenario, AD represents an active research topic, and many methods have been designed to estimate the competence of a model and the confidence in its outcome for a given prediction task. Thus, characterization of interpolation space is significant in defining the AD. The diverse set of reported AD methods was constructed through different hypotheses and algorithms. These multiplicities of methodologies mystify the end users and make the comparison of the AD for different models a complex issue to address. We have attempted to summarize in this chapter the important concepts of AD including particulars of the available methods to compute the AD along with their thresholds and criteria for estimating AD through training set interpolation in the descriptor space. The idea about transparent domain and decision domain are also discussed. To help readers determine the AD in their projects, practical examples together with available open source software tools are provided.

[1]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[2]  Kunal Roy,et al.  Importance of Applicability Domain of QSAR Models , 2015 .

[3]  Scott D. Kahn,et al.  Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships , 2005, Alternatives to laboratory animals : ATLA.

[4]  Weida Tong,et al.  Assessing QSAR Limitations - A Regulatory Perspective , 2005 .

[5]  Gergana Dimitrova,et al.  A Stepwise Approach for Defining the Applicability Domain of SAR and QSAR Models , 2005, J. Chem. Inf. Model..

[6]  Robert P. Sheridan,et al.  Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR , 2004, J. Chem. Inf. Model..

[7]  Andrew Smellie,et al.  Accelerated K-Means Clustering in Metric Spaces , 2004, J. Chem. Inf. Model..

[8]  Igor V. Tetko,et al.  Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection , 2008, J. Chem. Inf. Model..

[9]  D. Massart,et al.  Detection of prediction outliers and inliers in multivariate calibration , 1999 .

[10]  Kunal Roy,et al.  A Primer on QSAR/QSPR Modeling: Fundamental Concepts , 2015 .

[11]  Supratik Kar,et al.  On a simple approach for determining applicability domain of QSAR models , 2015 .

[12]  Boris Mirkin,et al.  A Measure of Domain of Applicability for QSAR Modelling Based on Intelligent K-Means Clustering , 2007 .

[13]  J. Hair Multivariate data analysis , 1972 .

[14]  Manuela Pavan,et al.  The Characterisation of (Quantitative) Structure-Activity Relationships: Preliminary Guidance , 2005 .

[15]  Xianggui Qu,et al.  Multivariate Data Analysis , 2007, Technometrics.

[16]  K. Baumann,et al.  Chemoinformatic Classification Methods and their Applicability Domain , 2016, Molecular informatics.

[17]  J. J. Chen,et al.  Classification ensembles for unbalanced class sizes in predictive toxicology , 2005, SAR and QSAR in environmental research.

[18]  Andreas Zell,et al.  Atomic Local Neighborhood Flexibility Incorporation into a Structured Similarity Measure for QSAR , 2009, J. Chem. Inf. Model..

[19]  Boris Mirkin,et al.  Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science) , 2005 .

[20]  Brian D. Hudson,et al.  A Consensus Neural Network-Based Technique for Discriminating Soluble and Poorly Soluble Compounds , 2003, J. Chem. Inf. Comput. Sci..

[21]  Igor V. Tetko,et al.  Application of Associative Neural Networks for Prediction of Lipophilicity in ALOGPS 2.1 Program , 2002, J. Chem. Inf. Comput. Sci..

[22]  Shane Weaver,et al.  The importance of the domain of applicability in QSAR modeling. , 2008, Journal of molecular graphics & modelling.

[23]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[24]  G. Mangiatordi,et al.  Applicability Domain for QSAR models: where theory meets reality , 2016 .

[25]  Nina Nikolova-Jeliazkova,et al.  QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review , 2005, Alternatives to laboratory animals : ATLA.

[26]  Rajarshi Guha,et al.  Determining the Validity of a QSAR Model - A Classification Approach , 2005, J. Chem. Inf. Model..

[27]  J. Jaworska,et al.  Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. , 2003, Environmental health perspectives.

[28]  C Barber,et al.  Applicability domain: towards a more formal definition$ , 2016, SAR and QSAR in environmental research.

[29]  Kunal Roy and Supratik Kar,et al.  How to Judge Predictive Quality of Classification and Regression Based QSAR Models , 2015 .

[30]  Nina Nikolova-Jeliazkova,et al.  An Approach to Determining Applicability Domains for QSAR Group Contribution Models: An Analysis of SRC KOWWIN , 2005, Alternatives to laboratory animals : ATLA.

[31]  Weida Tong,et al.  Decision Forest: Combining the Predictions of Multiple Independent Decision Tree Models , 2003, J. Chem. Inf. Comput. Sci..

[32]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[33]  Igor V. Tetko,et al.  Associative Neural Network , 2002, Neural Processing Letters.