A novel unambiguous strategy of molecular feature extraction in machine learning assisted predictive models for environmental properties

Environmental properties of compounds provide significant information in treating organic pollutants, which drives the chemical process and environmental science toward eco-friendly technology. Traditional group contribution methods play an important role in property estimations, whereas various disadvantages emerge in their applications, such as scattered predicted values for certain groups of compounds. In order to address such issues, an extraction strategy for molecular features is proposed in this research, which is characterized by interpretability and discriminating power with regard to isomers. Based on the Henry's law constant data of organic compounds in water, we developed a hybrid predictive model that integrates the proposed strategy in conjunction with a neural network framework. The structure of the predictive model is optimized using cross-validation and grid search to improve its robustness. Moreover, the predictive model is improved by introducing the plane of best fit descriptor as input and adopting k-means clustering in sampling. In contrast with reported models in the literature, the developed predictive model demonstrates improved generality, higher accuracy, and fewer molecular features used in its development.

[1]  Kai Sundmacher,et al.  Computer-aided solvent selection and design for efficient chemical processes , 2020, Current Opinion in Chemical Engineering.

[2]  Xiangping Zhang,et al.  Predictive deep learning models for environmental properties: the direct calculation of octanol–water partition coefficients from molecular graphs , 2019, Green Chemistry.

[3]  Mario R. Eden,et al.  Developing non-linear rate constant QSPR using decision trees and multi-gene genetic programming , 2019, Comput. Chem. Eng..

[4]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[5]  A. Sosnowska,et al.  AquaBoxIL – a computational tool for determining the environmental distribution profile of ionic liquids , 2018 .

[6]  Rafiqul Gani,et al.  Prediction of acid dissociation constants of organic compounds using group contribution methods , 2018, Chemical Engineering Science.

[7]  R. Gani,et al.  Estimation of Physical Properties of Amino Acids by Group-Contribution Method , 2018 .

[8]  Nikhil Ketkar,et al.  Deep Learning with Python , 2017 .

[9]  J. Prausnitz,et al.  Henry's Constants of Persistent Organic Pollutants by a Group-Contribution Method Based on Scaled-Particle Theory. , 2017, Environmental science & technology.

[10]  J. Lee,et al.  Determination of the Henry's law constants of low-volatility compounds via the measured air-phase transfer coefficients. , 2017, Water research.

[11]  A. J. Hunt,et al.  Acid-catalysed carboxymethylation, methylation and dehydration of alcohols and phenols with dimethyl carbonate under mild conditions , 2016 .

[12]  A. J. Hunt,et al.  Tools and techniques for solvent selection: green solvent selection guides , 2016 .

[13]  W. Shen,et al.  Systematic design of an extractive distillation for maximum‐boiling azeotropes with heavy entrainers , 2015 .

[14]  M. Tobiszewski,et al.  A solvent selection guide based on chemometrics and multicriteria decision analysis , 2015 .

[15]  Niall J. English,et al.  Prediction of Henry's Law Constants via group-specific quantitative structure property relationships. , 2015, Chemosphere.

[16]  José I. García,et al.  Quantitative structure–property relationships prediction of some physico-chemical properties of glycerol based solvents , 2013 .

[17]  Haifeng Dong,et al.  A new fragment contribution‐corresponding states method for physicochemical properties prediction of ionic liquids , 2013 .

[18]  H. Modarress,et al.  Application of neural network molecular modeling for correlating and predicting Henry's law constants of gases in [bmim][PF6] at low pressures , 2012 .

[19]  Nathan Brown,et al.  Plane of Best Fit: A Novel Method to Characterize the Three-Dimensionality of Molecules , 2012, J. Chem. Inf. Model..

[20]  Ali Eslamimanesh,et al.  Empirical method for estimation of Henry’s law constant of non-electrolyte organic compounds in water , 2012 .

[21]  F. Gharagheizi,et al.  QSPR Molecular Approach for Estimating Henry’s Law Constants of Pure Compounds in Water at Ambient Conditions , 2012 .

[22]  Ali Eslamimanesh,et al.  Artificial Neural Network modeling of solubility of supercritical carbon dioxide in 24 commonly used ionic liquids , 2011 .

[23]  Ali Eslamimanesh,et al.  Empirical Method for Representing the Flash-Point Temperature of Pure Compounds , 2011 .

[24]  F. Gharagheizi,et al.  Determination of Parachor of Various Compounds Using an Artificial Neural Network−Group Contribution Method , 2011 .

[25]  Ali Eslamimanesh,et al.  Artificial Neural Network Modeling of Solubilities of 21 Commonly Used Industrial Solid Compounds in Supercritical Carbon Dioxide , 2011 .

[26]  F. Gharagheizi,et al.  A New Neural Network Group Contribution Method for Estimation of Upper Flash Point of Pure Chemicals , 2010 .

[27]  F. Gharagheizi,et al.  Prediction of Henry’s Law Constant of Organic Compounds in Water from a New Group-Contribution-Based Model , 2010 .

[28]  Mario R. Eden,et al.  Combined property clustering and GC+ techniques for process and product design , 2010, Comput. Chem. Eng..

[29]  Saravanaraj N. Ayyampalayam,et al.  Air-liquid partition coefficient for a diverse set of organic compounds: Henry's Law Constant in water and hexadecane. , 2008, Environmental science & technology.

[30]  H. Modarress,et al.  Modeling and predicting the Henry's law constants of methyl ketones in aqueous sodium sulfate solutions with artificial neural network , 2008 .

[31]  Zhirong Wang,et al.  Quantitative structure-property relationship studies for predicting flash points of alkanes using group bond contribution method with back-propagation neural network. , 2007, Journal of hazardous materials.

[32]  James H. Clark,et al.  Green chemistry: today (and tomorrow) , 2006 .

[33]  Tomasz Puzyn,et al.  Prediction of environmental partition coefficients and the Henry's law constants for 135 congeners of chlorodibenzothiophene. , 2006, Chemosphere.

[34]  Alexandre Arenas,et al.  A Fuzzy ARTMAP-Based Quantitative Structure-Property Relationship (QSPR) for the Henry's Law Constant of Organic Compounds , 2003, J. Chem. Inf. Comput. Sci..

[35]  V. Majer,et al.  Group contribution method for Henry's Law constant of aqueous hydrocarbons , 2002 .

[36]  Jorge A. Marrero,et al.  Group-Contribution-Based Estimation of Octanol/Water Partition Coefficient and Aqueous Solubility , 2002 .

[37]  S. Sandler,et al.  Henry's law constant of organic compounds in water from a group contribution model with multipole corrections , 2002 .

[38]  Xiaoyun Zhang,et al.  Radial basis function network-based quantitative structure–property relationship for the prediction of Henry’s law constant , 2002 .

[39]  Niall J. English,et al.  Prediction of Henry's Law Constants by a Quantitative Structure Property Relationship and Neural Networks , 2001, J. Chem. Inf. Comput. Sci..

[40]  Jorge A. Marrero,et al.  Group-contribution based estimation of pure component properties , 2001 .

[41]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[42]  John D. Hayler,et al.  CHEM21 selection guide of classical- and less classical-solvents , 2016 .

[43]  James H. Clark,et al.  Green chemistry: challenges and opportunities , 1999 .

[44]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .