Comparative study of biodegradability prediction of chemicals using decision trees, functional trees, and logistic regression

Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed.

[1]  Ivan Rusyn,et al.  The Use of Cell Viability Assay Data Improves the Prediction Accuracy of Conventional Quantitative Structure Activity Relationship Models of Animal Carcinogenicity , 2007 .

[2]  Ann Richard,et al.  ACToR--Aggregated Computational Toxicology Resource. , 2008, Toxicology and applied pharmacology.

[3]  Songbo Tan,et al.  An effective refinement strategy for KNN text classifier , 2006, Expert Syst. Appl..

[4]  Dragan Gamberger,et al.  Applications of experts’ judgement to derive structure-biodegradation relationships , 1996, Environmental science and pollution research international.

[5]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[6]  Robert S. Boethling,et al.  Predicting ready biodegradability in the Japanese ministry of international trade and industry test , 2000 .

[7]  Robert S. Boethling,et al.  Biodegradation data evaluation for structure/biodegradability relations , 1987 .

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..

[10]  B. Kompare,et al.  Estimating environmental pollution by xenobiotic chemicals using QSAR (QSBR) models based on artificial intelligence , 1998 .

[11]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[12]  Robert S. Boethling,et al.  Group contribution method for predicting probability and rate of aerobic biodegradation. , 1994, Environmental science & technology.

[13]  Douglas J. Klein,et al.  Random Walks and Chemical Graph Theory , 2004, J. Chem. Inf. Model..

[14]  A P Worth,et al.  The role of the European centre for the validation of alternative methods (ECVAM) in the validation of (Q)SARs , 2004, SAR and QSAR in environmental research.

[15]  Q. Henry Wu,et al.  Association Rule Mining-Based Dissolved Gas Analysis for Fault Diagnosis of Power Transformers , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Kweku-Muata Osei-Bryson,et al.  Evaluation of decision trees: a multi-criteria approach , 2004, Comput. Oper. Res..

[17]  Christoph Rücker,et al.  Modeling and predicting aquatic aerobic biodegradation – a review from a user's perspective , 2012 .

[18]  João Gama,et al.  Functional Trees , 2001, Machine Learning.

[19]  Willie J.G.M. Peijnenburg,et al.  Prediction of biodegradability from chemical structure: Modeling of ready biodegradation test data , 1999 .

[20]  Roberto Todeschini,et al.  Quantitative Structure − Activity Relationship Models for Ready Biodegradability of Chemicals , 2013 .

[21]  Jie Shen,et al.  In Silico Assessment of Chemical Biodegradability , 2012, J. Chem. Inf. Model..

[22]  W J Peijnenburg,et al.  Evaluation and application of models for the prediction of ready biodegradability in the MITI-I test. , 1999, Chemosphere.

[23]  J. Dolfing,et al.  Biodegradation of perfluorinated compounds. , 2008, Reviews of environmental contamination and toxicology.

[24]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[25]  Robert S. Boethling,et al.  Predictive model for aerobic biodegradability developed from a file of evaluated biodegradation data , 1992 .