Predictive deep learning models for environmental properties

Abstract As an essential environmental property, octanol-water partition coefficient (KOW) quantifies the lipophilicity of a compound and it could be further employed to predict the toxicity. Thus, it is an indispensable factor and should be considered in screening and development of green solvents with respect to unconventional and novel compounds. Herein, a deep-learning-assisted predictive model has been developed to accurately and reliably calculate log KOW values for organic compounds. An embedding algorithm was specifically established for generating signatures automatically for molecular structures to express structural information and connectivity. Afterwards, the Tree-structured long short-term memory (Tree-LSTM) network was used in conjunction with signature descriptor for automatic feature selection, and it was then coupled with the back-propagation neural network to develop a deep neural network (DNN), which is used for modeling quantity structure-property relationship (QSPR) to predict log KOW. Comparing with an authoritative estimation method, the proposed DNN-based QSPR model exhibited the better predictive accuracy and greater discriminative power in terms of the structural isomers and stereoisomers. As such, the proposed deep learning approach can act as a promising and intelligent tool for developing environmental property prediction methods for guiding development or screening of green solvents.

[1]  Julie B. Zimmerman,et al.  Assessment of predictive models for estimating the acute aquatic toxicity of organic chemicals , 2016 .

[2]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 1. Using Extended Valence Sequences in QSAR and QSPR Studies , 2003, J. Chem. Inf. Comput. Sci..

[3]  B. Admire,et al.  Estimating the physicochemical properties of polyhalogenated aromatic and aliphatic compounds using UPPER: part 2. Aqueous solubility, octanol solubility and octanol-water partition coefficient. , 2015, Chemosphere.

[4]  Jorge A. Marrero,et al.  Group-Contribution-Based Estimation of Octanol/Water Partition Coefficient and Aqueous Solubility , 2002 .

[5]  J. Dearden,et al.  QSPR prediction of physico-chemical properties for REACH , 2013, SAR and QSAR in environmental research.

[6]  Apostolis A. Koutinas,et al.  Food waste as a valuable resource for the production of chemicals, materials and fuels. Current situation and global perspective , 2013 .

[7]  Paola Gramatica,et al.  QSAR model reproducibility and applicability: A case study of rate constants of hydroxyl radical reaction models applied to polybrominated diphenyl ethers and (benzo‐)triazoles , 2011, J. Comput. Chem..

[8]  A. J. Hunt,et al.  DFT and experimental analysis of aluminium chloride as a Lewis acid proton carrier catalyst for dimethyl carbonate carboxymethylation of alcohols , 2017 .

[9]  W. Shen,et al.  Systematic design of an extractive distillation for maximum‐boiling azeotropes with heavy entrainers , 2015 .

[10]  James H. Clark,et al.  Green chemistry: challenges and opportunities , 1999 .

[11]  Xiang Li,et al.  A sustainability root cause analysis methodology and its application , 2011, Comput. Chem. Eng..

[12]  Jean-Loup Faulon,et al.  Developing a methodology for an inverse quantitative structure-activity relationship using the signature molecular descriptor. , 2002, Journal of molecular graphics & modelling.

[13]  C. Yoo,et al.  Quantitative structure-property relationship (QSPR) models for predicting the physicochemical properties of polychlorinated biphenyls (PCBs) using deep belief network. , 2018, Ecotoxicology and environmental safety.

[14]  Robert D. Carr,et al.  The Signature Molecular Descriptor. 4. Canonizing Molecules Using Extended Valence Sequences , 2004, J. Chem. Inf. Model..

[15]  P. Rotureau,et al.  A General Guidebook for the Theoretical Prediction of Physicochemical Properties of Chemicals for Regulatory Purposes. , 2015, Chemical reviews.

[16]  Jorge A. Marrero,et al.  Group-contribution based estimation of pure component properties , 2001 .

[17]  Serge Bakire,et al.  Developing predictive models for toxicity of organic chemicals to green algae based on mode of action. , 2018, Chemosphere.

[18]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[19]  Han van de Waterbeemd,et al.  Substructure and whole molecule approaches for calculating log P , 2001, J. Comput. Aided Mol. Des..

[20]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[21]  W. Meylan,et al.  Atom/fragment contribution method for estimating octanol-water partition coefficients. , 1995, Journal of pharmaceutical sciences.

[22]  Pierre Baldi,et al.  Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules , 2013, J. Chem. Inf. Model..

[23]  A. Leo CALCULATING LOG POCT FROM STRUCTURES , 1993 .

[24]  Ji-Lu Zheng,et al.  Life-cycle assessment and techno-economic analysis of the utilization of bio-oil components for the production of three chemicals , 2018 .

[25]  Chul-Woong Cho,et al.  Validation and updating of QSAR models for partitioning coefficients of ionic liquids in octanol-water and development of a new LFER model. , 2018, The Science of the total environment.

[26]  R. Gani,et al.  New group contribution method for estimating properties of pure compounds , 1994 .

[27]  M. Turchi,et al.  An evaluation of in-silico methods for predicting solute partition in multiphase complex fluids – A case study of octanol/water partition coefficient , 2019, Chemical Engineering Science.

[28]  Yuan Zhao,et al.  Computation of Octanol-Water Partition Coefficients by Guiding an Additive Model with Knowledge , 2007, J. Chem. Inf. Model..

[29]  Paola Gramatica,et al.  External Evaluation of QSAR Models, in Addition to Cross‐Validation: Verification of Predictive Capability on Totally New Chemicals , 2014, Molecular informatics.

[30]  Igor V. Tetko,et al.  Neural Network Modeling for Estimation of Partition Coefficient Based on Atom-Type Electrotopological State Indices , 2000, J. Chem. Inf. Comput. Sci..

[31]  M. Ertürk,et al.  Assessment and modeling of the novel toxicity data set of phenols to Chlorella vulgaris. , 2013, Ecotoxicology and environmental safety.

[32]  Ali Eslamimanesh,et al.  Artificial Neural Network modeling of solubility of supercritical carbon dioxide in 24 commonly used ionic liquids , 2011 .

[33]  A. J. Hunt,et al.  Challenges in the development of bio-based solvents: a case study on methyl(2,2-dimethyl-1,3-dioxolan-4-yl)methyl carbonate as an alternative aprotic solvent. , 2017, Faraday discussions.

[34]  J. A. Menéndez,et al.  Microwave-assisted pyrolysis of biomass feedstocks: the way forward? , 2012 .

[35]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[36]  Daisy Y. Kyu,et al.  Calculating Partition Coefficients of Small Molecules in Octanol/Water and Cyclohexane/Water. , 2016, Journal of chemical theory and computation.

[37]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[38]  James H. Clark,et al.  Green chemistry: today (and tomorrow) , 2006 .

[39]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[40]  E. Wyrzykowska,et al.  Virtual screening in the design of ionic liquids as environmentally safe bactericides , 2019, Green Chemistry.

[41]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[42]  N. Chemmangattuvalappil,et al.  A Novel Methodology for Property-Based Molecular Design Using Multiple Topological Indices , 2013 .

[43]  Mario R. Eden,et al.  Reverse problem formulation approach to molecular design using property operators based on signature descriptors , 2010, Comput. Chem. Eng..

[44]  R. Gani,et al.  Estimation of Physical Properties of Amino Acids by Group-Contribution Method , 2018 .

[45]  John D. Hayler,et al.  CHEM21 selection guide of classical- and less classical-solvents , 2016 .

[46]  S. Yalkowsky,et al.  Comparison of the octanol/water partition coefficients calculated by ClogP, ACDlogP and KowWin to experimentally determined values. , 2005, International journal of pharmaceutics.

[47]  Yang Su,et al.  Systematic approach for screening organic and ionic liquid solvents in homogeneous extractive distillation exemplified by the tert-butanol dehydration , 2019, Separation and Purification Technology.

[48]  Paola Gramatica,et al.  Real External Predictivity of QSAR Models. Part 2. New Intercomparable Thresholds for Different Validation Criteria and the Need for Scatter Plot Inspection , 2012, J. Chem. Inf. Model..

[49]  Nikolaos V. Sahinidis,et al.  Computer-aided molecular design: An introduction and review of tools, applications, and solution techniques , 2016, ArXiv.

[50]  Ali Eslamimanesh,et al.  Solubility Parameters of Nonelectrolyte Organic Compounds: Determination Using Quantitative Structure—Property Relationship Strategy , 2011 .

[51]  A. J. Hunt,et al.  Acid-catalysed carboxymethylation, methylation and dehydration of alcohols and phenols with dimethyl carbonate under mild conditions , 2016 .

[52]  R. Mannhold,et al.  Calculation of molecular lipophilicity: state of the art and comparison of methods on more than 96000 compounds , 2009, Journal of pharmaceutical sciences.

[53]  José I. García,et al.  Quantitative structure–property relationships prediction of some physico-chemical properties of glycerol based solvents , 2013 .

[54]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[55]  M. Tobiszewski,et al.  A solvent selection guide based on chemometrics and multicriteria decision analysis , 2015 .

[56]  A. J. Hunt,et al.  A methodical selection process for the development of ketones and esters as bio-based replacements for traditional hydrocarbon solvents , 2018 .

[57]  A. J. Hunt,et al.  Tools and techniques for solvent selection: green solvent selection guides , 2016 .

[58]  Igor V. Tetko,et al.  Prediction of n-Octanol/Water Partition Coefficients from PHYSPROP Database Using Artificial Neural Networks and E-State Indices , 2001, J. Chem. Inf. Comput. Sci..

[59]  Kaila B Hanson,et al.  Estimating n-octanol-water partition coefficients for neutral highly hydrophobic chemicals using measured n-butanol-water partition coefficients. , 2019, Chemosphere.

[60]  Limin Li,et al.  Outlier Detection and Correction During the Process of Groundwater Lever Monitoring Base on Pauta Criterion with Self-learning and Smooth Processing , 2016, AsiaSim/SCS AutumnSim.

[61]  Yang Su,et al.  An architecture of deep learning in QSPR modeling for the prediction of critical properties using molecular signatures , 2019, AIChE Journal.

[62]  A. Sosnowska,et al.  Filling environmental data gaps with QSPR for ionic liquids: Modeling n-octanol/water coefficient. , 2016, Journal of hazardous materials.

[63]  W. Shen,et al.  Optimal Design and Effective Control of Triple-Column Extractive Distillation for Separating Ethyl Acetate/Ethanol/Water with Multiazeotrope , 2019, Industrial & Engineering Chemistry Research.

[64]  Vijay K. Gombar,et al.  Assessment of n-Octanol/Water Partition Coefficient: When Is the Assessment Reliable? , 1996, J. Chem. Inf. Comput. Sci..

[65]  K. Joback,et al.  ESTIMATION OF PURE-COMPONENT PROPERTIES FROM GROUP-CONTRIBUTIONS , 1987 .