Predicting persistence in the sediment compartment with a new automatic software based on the k-Nearest Neighbor (k-NN) algorithm.

The ability of a substance to resist degradation and persist in the environment needs to be readily identified in order to protect the environment and human health. Many regulations require the assessment of persistence for substances commonly manufactured and marketed. Besides laboratory-based testing methods, in silico tools may be used to obtain a computational prediction of persistence. We present a new program to develop k-Nearest Neighbor (k-NN) models. The k-NN algorithm is a similarity-based approach that predicts the property of a substance in relation to the experimental data for its most similar compounds. We employed this software to identify persistence in the sediment compartment. Data on half-life (HL) in sediment were obtained from different sources and, after careful data pruning the final dataset, containing 297 organic compounds, was divided into four experimental classes. We developed several models giving satisfactory performances, considering that both the training and test set accuracy ranged between 0.90 and 0.96. We finally selected one model which will be made available in the near future in the freely available software platform VEGA. This model offers a valuable in silico tool that may be really useful for fast and inexpensive screening.

[1]  Svetoslav H. Slavov,et al.  Partial least square and k‐nearest neighbor algorithms for improved 3D quantitative spectral data–activity relationship consensus modeling of acute toxicity , 2014, Environmental toxicology and chemistry.

[2]  Frank Wania,et al.  Potential of degradable organic chemicals for absolute and relative enrichment in the Arctic. , 2006, Environmental science & technology.

[3]  Michael Neumann,et al.  Proposal for a harmonised PBT identification across different regulatory frameworks , 2014, Environmental Sciences Europe.

[4]  Martin Ester,et al.  Optimally discriminative subnetwork markers predict response to chemotherapy , 2011, Bioinform..

[5]  Todd Gouin,et al.  Comparison of two methods for obtaining degradation half-lives. , 2004, Chemosphere.

[6]  R. Boethling,et al.  Expert systems survey on biodegradation of xenobiotic chemicals. , 1989, Ecotoxicology and environmental safety.

[7]  Frank Wania,et al.  Assessing the Potential of Persistent Organic Chemicals for Long-Range Transport and Accumulation in Polar Regions , 2003 .

[8]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[9]  Robert S Boethling,et al.  Predicting ready biodegradability of premanufacture notice chemicals , 2003, Environmental toxicology and chemistry.

[10]  Robert Boethling,et al.  Environmental Persistence of Organic Pollutants: Guidance for Development and Review of POP Risk Profiles , 2009, Integrated environmental assessment and management.

[11]  Tala Henry,et al.  Integrated Approach to PBT and POP Prioritization and Risk Assessment , 2009, Integrated environmental assessment and management.

[12]  Paola Gramatica,et al.  Screening and ranking of POPs for global half-life: QSAR approaches for prioritization based on molecular structure. , 2007, Environmental science & technology.

[13]  Ralph Kühne,et al.  Quantitative read-across for predicting the acute fish toxicity of organic compounds. , 2011, Environmental science & technology.

[14]  A P Worth,et al.  Prediction of Acute Rodent Toxicity on the Basis of Chemical Structure and Physicochemical Similarity , 2011, Molecular informatics.

[15]  Emilio Benfenati,et al.  A new in silico classification model for ready biodegradability, based on molecular fragments. , 2014, Chemosphere.

[16]  Mark H M M Montforts,et al.  PBT assessment using the revised annex XIII of REACH: a comparison with other regulatory frameworks. , 2012, Integrated environmental assessment and management.

[17]  Emilio Benfenati,et al.  A generalizable definition of chemical similarity for read-across , 2014, Journal of Cheminformatics.

[18]  Domenico Gadaleta A k-NN algorithm for predicting oral sub-chronic toxicity in the rat - Supplementary Data , 2014 .

[19]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..