In silico prediction of chemical aquatic toxicity for marine crustaceans via machine learning.

Aquatic toxicity is a crucial endpoint for evaluating chemically adverse effects on ecosystems. Therefore, we developed in silico methods for the prediction of chemical aquatic toxicity in marine environment. At first, a diverse data set including different crustacean species was constructed. We then built local binary models using Mysidae data and global binary models using Mysidae, Palaemonidae, and Penaeidae data. Molecular fingerprints and descriptors were employed to represent chemical structures separately. All the models were built by six machine learning methods. The AUC (area under the receiver operating characteristic curve) values of the better local and global models were around 0.8 and 0.9 for the test sets, respectively. We also identified several chemicals with selective toxicity on different species. The analysis of selective toxicity would promote to design greener chemicals in a specific environment. Finally, to understand and interpret the models, we explored the relationships between chemical aquatic toxicity and the molecular descriptors. Our study would be helpful in gaining further insights into marine organisms, prediction of chemical aquatic toxicity and prioritization of environmental hazard assessment.

[1]  J. G. Hengstler,et al.  Alternative methods to safety studies in experimental animals: role in the risk assessment of chemicals under the new European Chemicals Legislation (REACH) , 2008, Archives of Toxicology.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Jie Shen,et al.  In Silico Assessment of Chemical Biodegradability , 2012, J. Chem. Inf. Model..

[4]  Yong Wang,et al.  Online active learning of decision trees with evidential data , 2016, Pattern Recognit..

[5]  Nikita Basant,et al.  QSTR modeling for predicting aquatic toxicity of pharmacological active compounds in multiple test species for regulatory purpose. , 2015, Chemosphere.

[6]  J. Mauchline,et al.  World list of the Mysidacea, Crustacea , 1977 .

[7]  Scott E Belanger,et al.  Aquatic toxicity structure-activity relationships for the zwitterionic surfactant alkyl dimethyl amine oxide to several aquatic species and a resulting species sensitivity distribution. , 2016, Ecotoxicology and environmental safety.

[8]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[9]  Dong-Sheng Cao,et al.  PyDPI: Freely Available Python Package for Chemoinformatics, Bioinformatics, and Chemogenomics Studies , 2013, J. Chem. Inf. Model..

[10]  Shikha Gupta,et al.  Predicting toxicities of ionic liquids in multiple test species – an aid in designing green chemicals , 2014 .

[11]  Nikita Basant,et al.  Predicting aquatic toxicities of chemical pesticides in multiple test species using nonlinear QSTR modeling approaches. , 2015, Chemosphere.

[12]  Shikha Gupta,et al.  Predicting Toxicities of Diverse Chemical Pesticides in Multiple Avian Species Using Tree-Based QSAR Approaches for Regulatory Purposes , 2015, J. Chem. Inf. Model..

[13]  A. Kuhn,et al.  An evaluation of the seven‐day toxicity test with Americamysis bahia (formerly Mysidopsis bahia) , 1999 .

[14]  Paola Gramatica,et al.  Aquatic ecotoxicity of personal care products: QSAR models and ranking for prioritization and safer alternatives’ design , 2016 .

[15]  Nikita Basant,et al.  Modeling the toxicity of chemical pesticides in multiple test species using local and global QSTR approaches. , 2016, Toxicology research.

[16]  Sylvia Escher,et al.  Exposure-based waiving under REACH. , 2010, Regulatory toxicology and pharmacology : RTP.

[17]  Colin R. Janssen,et al.  Mysid crustaceans as standard models for the screening and testing of endocrine-disrupting chemicals , 2007, Ecotoxicology.

[18]  Shikha Gupta,et al.  Predicting acute aquatic toxicity of structurally diverse chemicals in fish using artificial intelligence approaches. , 2013, Ecotoxicology and environmental safety.

[19]  Rolf Altenburger,et al.  Structural alerts--a new classification model to discriminate excess toxicity from narcotic effect levels of organic compounds in the acute daphnid assay. , 2005, Chemical research in toxicology.

[20]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[21]  Dong-Sheng Cao,et al.  ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation , 2015, Journal of Cheminformatics.

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Pablo Carbonell,et al.  Compound toxicity screening and structure-activity relationship modeling in Escherichia coli. , 2012, Biotechnology and bioengineering.

[24]  Colin R. Janssen,et al.  Mysid crustaceans as potential test organisms for the evaluation of environmental endocrine disruption: A review , 2004, Environmental toxicology and chemistry.

[25]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[26]  Andrew P. Worth,et al.  Predicting Toxicological and Ecotoxicological Endpoints , 2007 .

[27]  L. Hall,et al.  The E-state in database analysis: the PCBs as an example , 1999 .

[28]  Dong-Sheng Cao,et al.  PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions , 2018, Journal of Cheminformatics.

[29]  Yun Tang,et al.  In SilicoPrediction of Blood–Brain Partitioning Using a Chemometric Method Called Genetic Algorithm Based Variable Selection , 2008 .

[30]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[31]  C. Langdon,et al.  A 7‐D toxicity test for marine pollutants using the pacific mysid Mysidopsis intii. 2. Protocol evaluation , 1996 .

[32]  Martin Mozina,et al.  Orange: data mining toolbox in python , 2013, J. Mach. Learn. Res..

[33]  T. Hamaker,et al.  Mysids in toxicity testing — a review , 1982, Hydrobiologia.

[34]  E. Oberdörster,et al.  Gender benders at the beach: Endocrine disruption in marine and estuarine organisms , 2001, Environmental toxicology and chemistry.

[35]  H. A. Aziz,et al.  A Review on Biodegradation and Toxicity Methods: Risk Assessment, Standards, and Analyses , 2018 .

[36]  Dong-Sheng Cao,et al.  ChemoPy: freely available python package for computational biology and chemoinformatics , 2013, Bioinform..

[37]  Chyon-Hwa Yeh,et al.  Classification and regression trees (CART) , 1991 .

[38]  Virginia H. Dale,et al.  Challenges in the development and use of ecological indicators , 2001 .

[39]  Lei Yang,et al.  Classification of Cytochrome P450 Inhibitors and Noninhibitors Using Combined Classifiers , 2011, J. Chem. Inf. Model..

[40]  A. Schotthoefer,et al.  Agrochemicals increase trematode infections in a declining amphibian species , 2008, Nature.

[41]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[42]  I A Basheer,et al.  Artificial neural networks: fundamentals, computing, design, and application. , 2000, Journal of microbiological methods.

[43]  Frederick P. Roth,et al.  Chemical substructures that enrich for biological activity , 2008, Bioinform..

[44]  H. Buist,et al.  The OSIRIS Weight of Evidence approach: ITS for skin sensitisation. , 2013, Regulatory toxicology and pharmacology : RTP.

[45]  Dinesh Mohan,et al.  Multispecies QSAR modeling for predicting the aquatic toxicity of diverse organic chemicals for regulatory toxicology. , 2014, Chemical research in toxicology.

[46]  Alexander Golbraikh,et al.  Predictive QSAR modeling workflow, model applicability domains, and virtual screening. , 2007, Current pharmaceutical design.

[47]  Boris Hollas,et al.  An Analysis of the Autocorrelation Descriptor for Molecules , 2003 .

[48]  S. Winiwarter,et al.  5.22 – Use of Molecular Descriptors for Absorption, Distribution, Metabolism, and Excretion Predictions , 2007 .

[49]  Ann Richard,et al.  ACToR--Aggregated Computational Toxicology Resource. , 2008, Toxicology and applied pharmacology.

[50]  J. Widdows,et al.  Toxicity of the organophosphate pesticides chlorpyrifos and dimethoate to Neomysis integer (Crustacea: Mysidacea) , 1999 .

[51]  James N. Morgan,et al.  The detection of interaction effects : a report on a computer program for the selection of optimal combinations of explanatory variables , 1964 .

[52]  Paul Watson,et al.  Naïve Bayes Classification Using 2D Pharmacophore Feature Triplet Vectors , 2008, J. Chem. Inf. Model..