QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review

As the use of Quantitative Structure Activity Relationship (QSAR) models for chemical management increases, the reliability of the predictions from such models is a matter of growing concern. The OECD QSAR Validation Principles recommend that a model should be used within its applicability domain (AD). The Setubal Workshop report provided conceptual guidance on defining a (Q)SAR AD, but it is difficult to use directly. The practical application of the AD concept requires an operational definition that permits the design of an automatic (computerised), quantitative procedure to determine a model's AD. An attempt is made to address this need, and methods and criteria for estimating AD through training set interpolation in descriptor space are reviewed. It is proposed that response space should be included in the training set representation. Thus, training set chemicals are points in n-dimensional descriptor space and m-dimensional model response space. Four major approaches for estimating interpolation regions in a multivariate space are reviewed and compared: range, distance, geometrical, and probability density distribution.

[1]  C. Eskes,et al.  Alternative (non-Animal) Methods for Cosmetics Testing : Current Status and Future Prospects. A Report Prepared in the Context of the 7th Amendment to the Cosmetics Directive for Establishing the Timetable for Phasing Out Animal Testing , 2005 .

[2]  Hugo Kubinyi,et al.  Similarity and Dissimilarity: A Medicinal Chemist’s View , 2002 .

[3]  Weida Tong,et al.  Assessment of Prediction Confidence and Domain Extrapolation of Two Structure–Activity Relationship Models for Predicting Estrogen Receptor Binding Activity , 2004, Environmental health perspectives.

[4]  S. Weinzierl Introduction to Monte Carlo methods , 2000, hep-ph/0006269.

[5]  Robert P. Sheridan,et al.  Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR , 2004, J. Chem. Inf. Model..

[6]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[7]  R. H. Myers Classical and modern regression with applications , 1986 .

[8]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[9]  L. Miles,et al.  2000 , 2000, RDH.

[10]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[11]  Paola Gramatica,et al.  Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. , 2003, Environmental health perspectives.

[12]  J. Friedman Exploratory Projection Pursuit , 1987 .

[13]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[14]  Michael Ian Shamos,et al.  Convex Hulls: Basic Algorithms , 1985 .

[15]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[16]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[17]  J. Jaworska,et al.  Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. , 2003, Environmental health perspectives.

[18]  Andrew W. Moore,et al.  Nonparametric Density Estimation: Toward Computational Tractability , 2003, SDM.

[19]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[20]  A. Debnath,et al.  A QSAR investigation of the role of hydrophobicity in regulating mutagenicity in the ames test: 1. Mutagenicity of aromatic and heteroaromatic amines in Salmonella typhimurium TA98 and TA100 , 1992, Environmental and molecular mutagenesis.

[21]  Scott D. Kahn,et al.  Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships , 2005, Alternatives to laboratory animals : ATLA.

[22]  Philip Howard,et al.  Practical considerations on the use of predictive models for regulatory purposes. , 2005, Environmental science & technology.

[23]  N. Nikolova,et al.  International Union of Pure and Applied Chemistry, LUMO energy ± The Lowest Unoccupied Molecular Orbital (LUMO) , 2022 .

[24]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[25]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[26]  Lothar Erdinger,et al.  Transformation of mutagenic aromatic amines into non-mutagenic species by alkyl substituents. Part II: alkylation far away from the amino function. , 2002, Mutation research.

[27]  L. Erdinger,et al.  Transformation of mutagenic aromatic amines into non-mutagenic species by alkyl substituents. Part I. Alkylation ortho to the amino function. , 2001, Mutation research.

[28]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[29]  Ming-Hui Chen,et al.  Monte Carlo Estimation of Bayesian Credible and HPD Intervals , 1999 .

[30]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[31]  Nina Nikolova-Jeliazkova,et al.  An Approach to Determining Applicability Domains for QSAR Group Contribution Models: An Analysis of SRC KOWWIN , 2005, Alternatives to laboratory animals : ATLA.