The Characterisation of (Quantitative) Structure-Activity Relationships: Preliminary Guidance

In November 2004, the OECD Member Countries and the European Commission adopted five principles for the validation of (quantitative) structure-activity relationships ([Q]SARs) intended for use in the regulatory assessment of chemicals. International agreement on a set of valdation principles was important, not only to provide regulatory bodies with a scientific basis for making decisions on the acceptability of data generated by (Q)SARs, but also to promote the mutual acceptance of (Q)SAR models by improving the transparency and consistency of (Q)SAR reporting. According to the OECD Principles for (Q)SAR validation, a (Q)SAR model that is proposed for regulatory use should be associated with five types of information: 1) a defined endpoint; 2) an unambiguous algorithm; 3) a defined domain of applicability; 4) appropriate measures of goodness-of-fit, robustness and predictivity; and 5) a mechanistic interpretation, if possible. Taken together, these five principles form the basis of a conceptual framework for characterising (Q)SAR models, and of reporting formats for describing the model characteristics in a transparent manner. Under the proposed REACH legislation in the EU, there are provisions for the use of estimated data generated by (Q)SARs, both as a substitute for experimental data, and as a supplement to experimental data in weight-of-evidence approaches. It is foreseen that (Q)SARs will be used for the three main regulatory goals of hazard assessment, risk assessment and PBT/vPvB assessment. In the Registration process under REACH, the registrant will be able to use (Q)SAR data in the registration dossier provided that adequate documentation is provided to argue for the validity of the model(s) used. This report provides preliminary guidance on how to characterise (Q)SARs according to the OECD validation principles. It is emphasised that the understanding of how to characterise (Q)SAR models is evolving, and that the content of the current report reflects the understanding and perspectives of the authors at the time of writing (November 2005). It is therefore likely that an update will be produced in the future for the benefit of those who need to submit (Industry) or evaluate (Authorities) chemical information based (partly) on (Q)SARs. It is also noted that this document does not provide guidance on the use of (Q)SAR reporting formats, or on criteria for the acceptance of (Q)SAR estimates, since EU guidance on these topics stills need to be developed.

[1]  Lennart Eriksson,et al.  Model validation by permutation tests: Applications to variable selection , 1996 .

[2]  T W Schultz,et al.  Structure-toxicity relationships for selected benzyl alcohols and the polar narcosis mechanism of toxicity. , 1988, Ecotoxicology and environmental safety.

[3]  M T D Cronin,et al.  Partial Least Squares Modelling of the Acute Toxicity of Aliphatic Compounds to Tetrahymena pyriformis , 2003, SAR and QSAR in environmental research.

[4]  Aleksandar Sabljić,et al.  Quantitative structure-activity relationships of acute toxicity of commercial chemicals on fathead minnows: effect of molecular size , 1989 .

[5]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[6]  L. Hammett The Effect of Structure upon the Reactions of Organic Compounds. Benzene Derivatives , 1937 .

[7]  Frederick E. Petry,et al.  Principles and Applications , 1997 .

[8]  J C Dearden,et al.  Creation of predictive models of aquatic toxicity of environmental pollutants with different mechanisms of action on the basis of molecular similarity and HYBOT descriptors , 2004, SAR and QSAR in environmental research.

[9]  A P Worth,et al.  The Use of Bootstrap Resampling to Assess the Uncertainty of Cooper Statistics , 2001, Alternatives to laboratory animals : ATLA.

[10]  Ernesto Estrada,et al.  Spectral Moments of the Edge-Adjacency Matrix of Molecular Graphs, 2. Molecules Containing Heteroatoms and QSAR Applications , 1997, J. Chem. Inf. Comput. Sci..

[11]  F. Burden,et al.  Robust QSAR models using Bayesian regularized neural networks. , 1999, Journal of medicinal chemistry.

[12]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[13]  W S Cain,et al.  A quantitative structure-activity relationship (QSAR) for a draize eye irritation database. , 1998, Toxicology in vitro : an international journal published in association with BIBRA.

[14]  H S Rosenkranz,et al.  Estimating the extent of the health hazard posed by high-production volume chemicals. , 2001, Environmental health perspectives.

[15]  David J. Livingstone,et al.  Data analysis for chemists , 1995 .

[16]  Paola Gramatica,et al.  Validated QSAR Prediction of OH Tropospheric Degradation of VOCs: Splitting into Training-Test Sets and Consensus Modeling , 2004, J. Chem. Inf. Model..

[17]  M. Martens,et al.  An objective method for the evaluation of eye irritation in vivo. , 1989, Food and chemical toxicology : an international journal published for the British Industrial Biological Research Association.

[18]  Eric R. Ziegel,et al.  Handbook of Chemometrics and Qualimetrics, Part B , 2000, Technometrics.

[19]  Ernesto Estrada,et al.  Spectral Moments of the Edge Adjacency Matrix in Molecular Graphs, 1. Definition and Applications to the Prediction of Physical Properties of Alkanes , 1996, J. Chem. Inf. Comput. Sci..

[20]  Michele Forina,et al.  Chemometric Study and Validation Strategies in the Structure-Activity Relationships of New Cardiotonic Agents , 1997 .

[21]  Scott D. Kahn,et al.  Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships , 2005, Alternatives to laboratory animals : ATLA.

[22]  Yiannis N. Kaznessis,et al.  Prediction of blood-brain partitioning using Monte Carlo simulations of molecules in water , 2001, J. Comput. Aided Mol. Des..

[23]  David W Roberts,et al.  Structure-toxicity relationships for the effects to Tetrahymena pyriformis of aliphatic, carbonyl-containing, alpha,beta-unsaturated chemicals. , 2005, Chemical research in toxicology.

[24]  G Klopman,et al.  In-Silico Screening of High Production Volume Chemicals for Mutagenicity using the mcase QSAR Expert System , 2003, SAR and QSAR in environmental research.

[25]  Tingjun Hou,et al.  ADME Evaluation in Drug Discovery. 3. Modeling Blood-Brain Barrier Partitioning Using Simple Molecular Descriptors , 2003, J. Chem. Inf. Comput. Sci..

[26]  P. Y. Jow,et al.  Structure-activity relationships in papain and bromelain ligand interactions. , 1977, Archives of biochemistry and biophysics.

[27]  Corwin Hansch,et al.  Quantitative structure-activity relationships of phenolic compounds causing apoptosis. , 2003, Bioorganic & medicinal chemistry.

[28]  Alexander Golbraikh,et al.  Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection , 2004, Molecular Diversity.

[29]  D M Bagley,et al.  Eye irritation: Reference chemicals data bank. , 1992, Toxicology in vitro : an international journal published in association with BIBRA.

[30]  C. Hansch,et al.  p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure , 1964 .

[31]  A R Feinstein,et al.  XXXI. On the sensitivity, specificity, and discrimination of diagnostic tests , 1975, Clinical pharmacology and therapeutics.

[32]  C. Ruepert,et al.  Quantitative structure-activity relationships for polycyclic aromatic hydrocarbons: Correlation between molecular connectivity, physico-chemical properties, bioconcentration and toxicity in Daphnia pulex , 1984 .

[33]  H. Kubinyi,et al.  3D QSAR in drug design. , 2002 .

[34]  H. Kubinyi QSAR : Hansch analysis and related approaches , 1993 .

[35]  Philip Howard,et al.  Practical considerations on the use of predictive models for regulatory purposes. , 2005, Environmental science & technology.

[36]  H Matter,et al.  Random or rational design? Evaluation of diverse compound subsets from chemical structure databases. , 1998, Journal of medicinal chemistry.

[37]  Marjan Vracko,et al.  Kohonen Artificial Neural Network and Counter Propagation Neural Network in Molecular Structure-Toxicity Studies , 2005 .

[38]  A. Sabljic,et al.  Chemical topology and ecotoxicology. , 1991, The Science of the total environment.

[39]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[40]  P. Qiu The Statistical Evaluation of Medical Tests for Classification and Prediction , 2005 .

[41]  J. Topliss,et al.  Chance factors in studies of quantitative structure-activity relationships. , 1979, Journal of medicinal chemistry.

[42]  Tingjun Hou,et al.  ADME evaluation in drug discovery , 2002, Journal of molecular modeling.

[43]  Robin Taylor,et al.  Simulation Analysis of Experimental Design Strategies for Screening Random Compounds as Potential New Drugs and Agrochemicals , 1995, J. Chem. Inf. Comput. Sci..

[44]  R. Todeschini,et al.  Detecting bad regression models: multicriteria fitness functions in regression analysis , 2004 .

[45]  M. Stone,et al.  Statistical Thinking and Technique for QSAR and Related Studies. Part 1. General Theory , 1994 .

[46]  Ernesto Estrada,et al.  Computer-aided knowledge generation for understanding skin sensitization mechanisms: the TOPS-MODE approach. , 2003, Chemical research in toxicology.

[47]  R. Benigni Structure-activity relationship studies of chemical mutagens and carcinogens: mechanistic investigations and prediction approaches. , 2005, Chemical reviews.

[48]  D. W. Osten,et al.  Selection of optimal regression models via cross‐validation , 1988 .

[49]  S. Unger Molecular Connectivity in Structure–activity Analysis , 1987 .

[50]  Gerald J. Niemi,et al.  Use of respiratory‐cardiovascular responses of rainbow trout (Salmo gairdneri) in identifying acute toxicity syndromes in fish: Part 1. pentachlorophenol, 2,4‐dinitrophenol, tricaine methanesulfonate and 1‐octanol , 1987 .

[51]  Weida Tong,et al.  Prediction of estrogen receptor binding for 58,000 chemicals using an integrated system of a tree-based model with structural alerts. , 2001, Environmental health perspectives.

[52]  Rajarshi Guha,et al.  Determining the Validity of a QSAR Model - A Classification Approach , 2005, J. Chem. Inf. Model..

[53]  Man-Ling Lee,et al.  DISE: Directed Sphere Exclusion , 2003, J. Chem. Inf. Comput. Sci..

[54]  C. Russom,et al.  Predicting modes of toxic action from chemical structure: Acute toxicity in the fathead minnow (Pimephales promelas) , 1997 .

[55]  Jens Sadowski,et al.  The Use of Self-organizing Neural Networks in Drug Design , 2002 .

[56]  D. Lewis Computer‐Assisted methods in the evaluation of chemical toxicity , 2007 .

[57]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[58]  D. Sanderson,et al.  Computer Prediction of Possible Toxic Action from Chemical Structure; The DEREK System , 1991, Human & experimental toxicology.

[59]  Ramaswamy Nilakantan,et al.  Database diversity assessment: New ideas, concepts, and tools , 1997, J. Comput. Aided Mol. Des..

[60]  Johann Gasteiger,et al.  Neural networks in chemistry and drug design , 1999 .

[61]  Mark T D Cronin,et al.  Structure-toxicity relationships for aliphatic chemicals evaluated with Tetrahymena pyriformis. , 2002, Chemical research in toxicology.

[62]  R. Saracci,et al.  Describing the validity of carcinogen screening tests. , 1979, British Journal of Cancer.

[63]  J. Devillers,et al.  A Noncongeneric Model for Predicting Toxicity of Organic Molecules to Vibrio Fischeri , 1999 .

[64]  R Purdy The utility of computed superdelocalizability for predicting the LC50 values of epoxides to guppies. , 1991, The Science of the total environment.

[65]  E. Estrada Spectral Moments of the Edge Adjacency Matrix in Molecular Graphs. Part 3. Molecules Containing Cycles , 1998 .

[66]  T. Wayne Schultz,et al.  Response-Surface Analyses for Toxicity to Tetrahymena pyriformis: Reactive Carbonyl-Containing Aliphatic Chemicals , 1999, J. Chem. Inf. Comput. Sci..

[67]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .

[68]  J. S. Hunter,et al.  Statistics for experimenters : an introduction to design, data analysis, and model building , 1979 .

[69]  S. Wold,et al.  PLS: Partial Least Squares Projections to Latent Structures , 1993 .

[70]  Michael H. Abraham,et al.  Scales of solute hydrogen-bonding: their construction and application to physicochemical and biochemical processes , 2010 .

[71]  Bernard Testa,et al.  QSAR: Hansch analysis and related approaches , 1995 .

[72]  Emilio Benfenati,et al.  Modeling Toxicity by Using Supervised Kohonen Neural Networks , 2003, J. Chem. Inf. Comput. Sci..

[73]  J. Zupan,et al.  Neural Networks in Chemistry , 1993 .

[74]  P Willett,et al.  Comparison of algorithms for dissimilarity-based compound selection. , 1997, Journal of molecular graphics & modelling.

[75]  A. Debnath,et al.  A QSAR investigation of the role of hydrophobicity in regulating mutagenicity in the ames test: 1. Mutagenicity of aromatic and heteroaromatic amines in Salmonella typhimurium TA98 and TA100 , 1992, Environmental and molecular mutagenesis.

[76]  Brian D. Hudson,et al.  Parameter Based Methods for Compound Selection from Chemical Databases , 1996 .

[77]  B. Skagerberg,et al.  Predictive ability of regression models. Part I: Standard deviation of prediction errors (SDEP) , 1992 .

[78]  Joop L. M. Hermens,et al.  The acute toxicity of aldehydes to the guppy , 1988 .

[79]  I. Pajeva,et al.  Quantitative structure-activity relationship (QSAR) and three-dimensional QSAR analysis of a series of xanthates as inhibitors and inactivators of cytochrome P450 2B1 , 2002, Xenobiotica; the fate of foreign compounds in biological systems.

[80]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[81]  Johann Gasteiger,et al.  Use of Structure Descriptors To Discriminate between Modes of Toxic Action of Phenols , 2005, J. Chem. Inf. Model..

[82]  J E Ridings,et al.  Computer prediction of possible toxic action from chemical structure: an update on the DEREK system. , 1996, Toxicology.

[83]  J. Jaworska,et al.  Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. , 2003, Environmental health perspectives.

[84]  Peter P. Mager,et al.  Diagnostics statistics in QSAR , 1995 .

[85]  J.C. Dearden,et al.  Improved prediction of fish bioconcentration factor of Hydrophobic Chemicals , 2004, SAR and QSAR in environmental research.

[86]  Worth Andrew,et al.  The Development and Validation of Expert Systems for Predicting Toxicity. , 1998 .

[87]  Emiel Rorije,et al.  Modeling the nucleophilic reactivity of small organochlorine electrophiles: A mechanistically based quantitative structure‐activity relationship , 1996 .

[88]  J. Hanley Receiver operating characteristic (ROC) methodology: the state of the art. , 1989, Critical reviews in diagnostic imaging.

[89]  Worth Andrew,et al.  Comparison of the Applicability Domain of a QSAR for Estrogenicity with a Large Chemical Inventory , 2006 .

[90]  Alexander Golbraikh,et al.  Rational selection of training and test sets for the development of validated QSAR models , 2003, J. Comput. Aided Mol. Des..

[91]  D W Roberts,et al.  The derivation of quantitative correlations between skin sensitisation and physio-chemical parameters for alkylating agents, and their application to experimental data for sultones. , 1982, Journal of theoretical biology.

[92]  Philip Jonathan,et al.  Statistical thinking and technique for QSAR and related studies. Part I: General theory , 1993 .

[93]  Y. L. Loukas,et al.  Adaptive neuro-fuzzy inference system: an instant and architecture-free predictor for improved QSAR studies. , 2001, Journal of medicinal chemistry.

[94]  Sovan Lek,et al.  Artificial neural networks as a tool in ecological modelling, an introduction , 1999 .

[95]  Akio Yamada,et al.  OECD Guidelines for Testing of Chemicals , 1982 .

[96]  Matthew Clark,et al.  The Probability of Chance Correlation Using Partial Least Squares (PLS) , 1993 .

[97]  David Hartsough,et al.  Toward an Optimal Procedure for Variable Selection and QSAR Model Building , 2001, J. Chem. Inf. Comput. Sci..

[98]  M. Newman,et al.  Steric Effects In Organic Chemistry , 1956 .

[99]  J. Hermens,et al.  Classifying environmental pollutants , 1992 .

[100]  Erik Johansson,et al.  Multivariate design and modeling in QSAR , 1996 .

[101]  Léopold Simar,et al.  Computer Intensive Methods in Statistics , 1994 .

[102]  Weida Tong,et al.  Decision Forest: Combining the Predictions of Multiple Independent Decision Tree Models , 2003, J. Chem. Inf. Comput. Sci..

[103]  M D Barratt,et al.  A quantitative structure-activity relationship for the eye irritation potential of neutral organic chemicals. , 1995, Toxicology letters.

[104]  J. Devillers,et al.  Practical applications of quantitative structure-activity relationships (QSAR) in environmental chemistry and toxicology , 1990 .

[105]  Roberto Todeschini,et al.  The K correlation index: theory development and its application in chemometrics , 1999 .

[106]  Gergana Dimitrova,et al.  A Stepwise Approach for Defining the Applicability Domain of SAR and QSAR Models , 2005, J. Chem. Inf. Model..

[107]  G M Singer,et al.  Quantitative structure-activity relationship of the mutagenicity of substituted N-nitroso-N-benzylmethylamines: possible implications for carcinogenicity. , 1986, Journal of medicinal chemistry.

[108]  L B Lusted,et al.  Signal detectability and medical decision-making. , 1971, Science.

[109]  Desire L. Massart,et al.  Random correlation in variable selection for multivariate calibration with a genetic algorithm , 1996 .

[110]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[111]  Ramon Carbó-Dorca,et al.  Similarity approach to QSAR. Application to antimycobacterial benzoxazines. , 2004, International journal of pharmaceutics.

[112]  T W Schultz,et al.  QSARs for monosubstituted phenols and the polar narcosis mechanism of toxicity. , 1992, Quality assurance.

[113]  J S Jaworska,et al.  Bayesian analysis and inference from QSAR predictive model results , 2002, SAR and QSAR in environmental research.

[114]  Mark T. D. Cronin,et al.  The use of discriminant analysis, logistic regression and classification tree analysis in the development of classification models for human health effects , 2003 .

[115]  Johanna Smeyers-Verbeke,et al.  Handbook of Chemometrics and Qualimetrics: Part A , 1997 .

[116]  Ernesto Estrada,et al.  Spectral Moments of the Edge Adjacency Matrix in Molecular Graphs. 3. Molecules Containing Cycles , 1998, J. Chem. Inf. Comput. Sci..

[117]  Johann Gasteiger,et al.  A QSAR Study on a Set of 105 Flavonoid Derivatives Using Descriptors Derived From 3D Structures , 2002 .

[118]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[119]  D. L. Massart,et al.  Optimization in Irregularly Shaped Regions: pH and Solvent Strength in Reversed-Phase High-Performance Liquid Chromatography Separations , 1994 .

[120]  A. Tropsha,et al.  Beware of q 2 , 2002 .

[121]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[122]  M T D Cronin,et al.  The importance of hydrophobicity and electrophilicity descriptors in mechanistically-based QSARs for toxicological endpoints , 2002, SAR and QSAR in environmental research.

[123]  R Benigni,et al.  QSAR models for both mutagenic potency and activity: Application to nitroarenes and aromatic amines , 1994, Environmental and molecular mutagenesis.

[124]  Svante Wold,et al.  Multivariate quantitative structure-activity relationships (QSAR): conditions for their applicability , 1983, J. Chem. Inf. Comput. Sci..

[125]  A. Debnath,et al.  Mutagenicity of quinolines in Salmonella typhimurium TA100. A QSAR study based on hydrophobicity and molecular orbital determinants. , 1992, Mutation research.

[126]  Gregory W. Kauffman,et al.  QSAR and k-Nearest Neighbor Classification Analysis of Selective Cyclooxygenase-2 Inhibitors Using Topologically-Based Numerical Descriptors , 2001, J. Chem. Inf. Comput. Sci..

[127]  L. Eriksson Multi- and megavariate data analysis , 2006 .

[128]  R. M. Muir,et al.  Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients , 1962, Nature.

[129]  Gilman D. Veith,et al.  QSAR prioritization of chemical inventories for endocrine disruptor testing , 2003 .

[130]  D. E. Patterson,et al.  Crossvalidation, Bootstrapping, and Partial Least Squares Compared with Multiple Regression in Conventional QSAR Studies , 1988 .

[131]  J. S. Wang Statistical Theory of Superlattices with Long-Range Interaction. I. General Theory , 1938 .

[132]  Worth Andrew,et al.  Preliminary Analysis of an Aquatic Toxicity Dataset and Assessment of QSAR Models for Narcosis , 2005 .

[133]  S. Lange,et al.  [The kappa coefficient]. , 2007, Deutsche medizinische Wochenschrift.

[134]  Worth Andrew,et al.  Use of Quantitative Structrure-Activity Relationships in International Decision-Making Frameworks to Predict Ecologic Effects and Environmental Fate of Chemical Substances. , 2003 .

[135]  Lowell H. Hall,et al.  Modeling Antileukemic Activity of Carboquinones with Electrotopological State and Chi Indices , 1999, J. Chem. Inf. Comput. Sci..

[136]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .

[137]  W. A. WATERS,et al.  Physical Organic Chemistry: , 1941, Nature.

[138]  Hein Putter,et al.  The bootstrap: a tutorial , 2000 .

[139]  Jerome H. Friedman,et al.  Classification: Oldtimers and newcomers , 1989 .