The Utility of Structure–Activity Relationship (SAR) Models for Prediction and Covariate Selection in Developmental Toxicity: Comparative Analysis of Logistic Regression and Decision Tree Models

Structure–activity relationship (SAR) models can be used to predict the biological activity of potential developmental toxicants whose adverse effects include death, structural abnormalities, altered growth and functional deficiencies in the developing organism. Physico-chemical descriptors of spatial, electronic and lipophilic properties were used to derive SAR models by two modeling approaches, logistic regression and Classification and Regression Tree (CART), using a new developmental database of 293 chemicals (FDA/TERIS). Both single models and ensembles of models (termed bagging) were derived to predict toxicity. Assessment of the empirical distributions of the prediction measures was performed by repeated random partitioning of the data set. Results showed that both the decision tree and logistic regression derived developmental SAR models exhibited modest prediction accuracy. Bagging tended to enhance the prediction accuracy and reduced the variability of prediction measures compared to the single model for CART-based models but not consistently for logistic-based models. Prediction accuracy of single logistic-based models was higher than single CART-based models but bagged CART-based models were more predictive. Descriptor selection in SAR for the understanding of the developmental mechanism was highly dependent on the modeling approach. Although prediction accuracy was similar in the two modeling approaches, there was inconsistency in the model descriptors.

[1]  C. Hansch,et al.  p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure , 1964 .

[2]  Robert E. Smolker,et al.  The environmental defense fund , 1968 .

[3]  David L. Beveridge,et al.  Approximate molecular orbital theory , 1970 .

[4]  F Willgeroth,et al.  [Drugs in pregnancy and lactation]. , 1979, Fortschritte der Medizin.

[5]  Thomas M. Dyott,et al.  MOLY-An Interactive System for Molecular Analysis , 1980, J. Chem. Inf. Comput. Sci..

[6]  R. H. Myers Classical and modern regression with applications , 1986 .

[7]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[8]  H. Kubinyi QSAR: Hansch Analysis and Related Approaches: Kubinyi/QSAR , 1993 .

[9]  Bernard Testa,et al.  QSAR: Hansch analysis and related approaches , 1995 .

[10]  B. Everitt,et al.  The Cambridge Dictionary of Statistics in the Medical Sciences , 1995 .

[11]  S. Bradbury,et al.  Quantitative structure-activity relationships and ecological risk assessment: an overview of predictive aquatic toxicology research. , 1995, Toxicology letters.

[12]  H. Rosenkranz,et al.  Structural determinants associated with risk of human developmental toxicity. , 1997, American journal of obstetrics and gynecology.

[13]  T. Wayne Schultz,et al.  Structure‐activity relationships for Pimephales and Tetrahymena: A mechanism of action approach , 1997 .

[14]  J Devillers,et al.  PLS-QSAR of the adult and developmental toxicity of chemicals to Hydra attenuata , 2002, SAR and QSAR in environmental research.

[15]  J Devillers,et al.  QSAR modeling of the adult and developmental toxicity of glycols, glycol ethers and xylenes to hydra attenuata , 2002, SAR and QSAR in environmental research.

[16]  Shijin Ren,et al.  Identifying the mechanism of aquatic toxicity of selected compounds by hydrophobicity and electrophilicity descriptors. , 2002, Toxicology letters.

[17]  Mark T. D. Cronin,et al.  Structure-Based Classification of Antibacterial Activity , 2002, J. Chem. Inf. Comput. Sci..

[18]  V C Arena,et al.  Decision tree SAR models for developmental toxicity based on an FDA/TERIS database , 2003, SAR and QSAR in environmental research.

[19]  Wei Li,et al.  Model validation software for classification models using repeated partitioning: MVREP , 2003, Comput. Methods Programs Biomed..

[20]  M. Cronin,et al.  Pitfalls in QSAR , 2003 .

[21]  Klaus L.E. Kaiser,et al.  The use of neural networks in QSARs for acute aquatic toxicological endpoints , 2003 .

[22]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.