A Bayesian Approach to Predict Solubility Parameters

Solubility is a ubiquitous phenomenon in many aspects of material science. While solubility can be determined by considering the cohesive forces in a liquid via the Hansen solubility parameters (HSP), quantitative structure–property relationship models are often used for prediction, notably due to their low computational cost. Here, gpHSP, an interpretable and versatile probabilistic approach to determining HSP, is reported. Our model is based on Gaussian processes, a Bayesian machine learning approach that provides uncertainty bounds to prediction. gpHSP achieves its flexibility by leveraging a variety of input data, such as SMILES strings, COSMOtherm simulations, and quantum chemistry calculations. gpHSP is built on experimentally determined HSP, including a general solvents set aggregated from the literature, and a polymer set experimentally characterized by this group of authors. In all sets, a high degree of agreement is obtained, surpassing well‐established machine learning methods. The general applicability of gpHSP to miscibility of organic semiconductors, drug compounds, and in general solvents is demonstrated, which can be further extended to other domains. gpHSP is a fast and accurate toolbox, which could be applied to molecular design for solution processing technologies.

[1]  M. Muir Physical Chemistry , 1888, Nature.

[2]  J. W.,et al.  The Journal of Physical Chemistry , 1900, Nature.

[3]  J. Perdew,et al.  Density-functional approximation for the correlation energy of the inhomogeneous electron gas. , 1986, Physical review. B, Condensed matter.

[4]  A. Becke,et al.  Density-functional exchange-energy approximation with correct asymptotic behavior. , 1988, Physical review. A, General physics.

[5]  W. Goddard,et al.  UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations , 1992 .

[6]  A. Klamt,et al.  COSMO : a new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient , 1993 .

[7]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[8]  Ernst Anders,et al.  Optimization and application of lithium parameters for PM3 , 1993, J. Comput. Chem..

[9]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings , 1997 .

[10]  Peter York,et al.  The use of solubility parameters in pharmaceutical dosage form design , 1997 .

[11]  P. Avontuur,et al.  Solubility parameter and oral absorption. , 1999, European journal of pharmaceutics and biopharmaceutics : official journal of Arbeitsgemeinschaft fur Pharmazeutische Verfahrenstechnik e.V.

[12]  The modified extended Hansen method to determine partial solubility parameters of drugs containing a single hydrogen bonding group and their sodium derivatives: benzoic acid/Na and ibuprofen/Na. , 2000, International journal of pharmaceutics.

[13]  Thomas de Quincey [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[14]  P. Augustijns,et al.  Determination of partial solubility parameters of five benzodiazepines in individual solvents. , 2001, International journal of pharmaceutics.

[15]  P. Ruelle,et al.  Significance of Partial and Total Cohesion Parameters of Pharmaceutical Solids Determined from Dissolution Calorimetric Measurements , 1991, Pharmaceutical Research.

[16]  The basic COSMO-RS , 2005 .

[17]  A. Maiti,et al.  Nanotube–polymer composites: insights from Flory–Huggins theory and mesoscale simulations , 2005 .

[18]  P. Cummings,et al.  Fluid phase equilibria , 2005 .

[19]  F. Weigend,et al.  Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. , 2005, Physical chemistry chemical physics : PCCP.

[20]  Yu Zhu,et al.  Macromolecular Chemistry and Physics , 2006 .

[21]  Y. A. Liu,et al.  Sigma-Profile Database for Using COSMO-Based Thermodynamic Methods , 2006 .

[22]  Stefan Grimme,et al.  Semiempirical GGA‐type density functional constructed with a long‐range dispersion correction , 2006, J. Comput. Chem..

[23]  T. Frauenheim,et al.  DFTB+, a sparse matrix-based implementation of the DFTB method. , 2007, The journal of physical chemistry. A.

[24]  C. Hansen Solubility Parameters — An Introduction , 2007 .

[25]  C. Hansen,et al.  Hansen Solubility Parameters : A User's Handbook, Second Edition , 2007 .

[26]  Jan W. Gooch,et al.  Encyclopedic dictionary of polymers , 2007 .

[27]  Jie Xu,et al.  Application of QSPR to Binary Polymer/Solvent Mixtures: Prediction of Flory-Huggins Parameters , 2008 .

[28]  Robert C. Glen,et al.  Solubility Challenge: Can You Predict Solubilities of 32 Molecules Using a Database of 100 Reliable Measurements? , 2008, J. Chem. Inf. Model..

[29]  J. Coleman,et al.  Multicomponent solubility parameters for single-walled carbon nanotube-solvent mixtures. , 2009, ACS nano.

[30]  R. Segalman,et al.  Block Copolymers for Organic Optoelectronics , 2009 .

[31]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[32]  Gang Li,et al.  For the Bright Future—Bulk Heterojunction Polymer Solar Cells with Power Conversion Efficiency of 7.4% , 2010, Advanced materials.

[33]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[34]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[35]  Steven Abbott,et al.  Determination of Solubility Parameters for Organic Semiconductor Formulations , 2011 .

[36]  A. Hexemer,et al.  Polymer Crystallization of Partially Miscible Polythiophene/Fullerene Mixtures Controls Morphology , 2011 .

[37]  Thuc‐Quyen Nguyen,et al.  A Systematic Approach to Solvent Selection Based on Cohesive Energy Densities in a Molecular Bulk Heterojunction System , 2011 .

[38]  G. Járvás,et al.  Estimation of Hansen solubility parameters using multivariate nonlinear QSPR modeling with COSMO scr , 2011 .

[39]  S. Velaga,et al.  Hansen solubility parameter as a tool to predict cocrystal formation. , 2011, International journal of pharmaceutics.

[40]  隆弘 梅津 Hansen Solubility Parameters による化学物質保護衣の選定 , 2012 .

[41]  James J. P. Stewart,et al.  Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parameters , 2012, Journal of Molecular Modeling.

[42]  Steven Abbott,et al.  Determination of the P3HT:PCBM solubility parameters via a binary solvent gradient method: Impact of solubility on the photovoltaic performance , 2012 .

[43]  Thuc‐Quyen Nguyen,et al.  Molecular solubility and hansen solubility parameters for the analysis of phase separation in bulk heterojunctions , 2012 .

[44]  Frank Neese,et al.  The ORCA program system , 2012 .

[45]  J. Coleman,et al.  Generalizing solubility parameter theory to apply to one‐ and two‐dimensional solutes and to incorporate dipolar interactions , 2013 .

[46]  E. Lucas,et al.  Determining hildebrand solubility parameter by ultraviolet spectroscopy and microcalorimetry , 2013 .

[47]  Xiaojing Zhou,et al.  The role of miscibility in polymer:fullerene nanoparticulate organic photovoltaic devices , 2013 .

[48]  Daniel T. W. Toolan,et al.  Determination of Solvent–Polymer and Polymer–Polymer Flory–Huggins Interaction Parameters for Poly(3-hexylthiophene) via Solvent Vapor Swelling , 2013 .

[49]  L. Servant,et al.  Guiding the Selection of Processing Additives for Increasing the Efficiency of Bulk Heterojunction Polymeric Solar Cells , 2014 .

[50]  Christoph J. Brabec,et al.  Solubility Based Identification of Green Solvents for Small Molecule Organic Solar Cells , 2014 .

[51]  J. Brédas,et al.  Influence of Molecular Shape on Solid-State Packing in Disordered PC61BM and PC71BM Fullerenes. , 2014, The journal of physical chemistry letters.

[52]  Tong Zhang,et al.  Learning Nonlinear Functions Using Regularized Greedy Forest , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Johan Ulander,et al.  Computational Prediction of Drug Solubility in Fasted Simulated and Aspirated Human Intestinal Fluid , 2014, Pharmaceutical Research.

[54]  R. J. Kline,et al.  In Situ Characterization of Polymer–Fullerene Bilayer Stability , 2015 .

[55]  R. Dauskardt,et al.  Molecular-Scale Understanding of Cohesion and Fracture in P3HT:Fullerene Blends. , 2015, ACS applied materials & interfaces.

[56]  John B. O. Mitchell,et al.  A review of methods for the calculation of solution free energies and the modelling of systems in solution. , 2015, Physical chemistry chemical physics : PCCP.

[57]  C. Brabec,et al.  Classification of additives for organic photovoltaic devices. , 2015, Chemphyschem : a European journal of chemical physics and physical chemistry.

[58]  G. Járvás,et al.  Combined Computational Approach Based on Density Functional Theory and Artificial Neural Networks for Predicting The Solubility Parameters of Fullerenes. , 2016, Journal of Physical Chemistry B.

[59]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[60]  S. Murdan,et al.  Application of Hansen Solubility Parameters to predict drug-nail interactions, which can assist the design of nail medicines. , 2016, European journal of pharmaceutics and biopharmaceutics : official journal of Arbeitsgemeinschaft fur Pharmazeutische Verfahrenstechnik e.V.

[61]  Michael C. Heiber,et al.  Small is Powerful: Recent Progress in Solution‐Processed Small Molecule Solar Cells , 2017 .

[62]  Christoph J. Brabec,et al.  Abnormal strong burn-in degradation of highly efficient polymer solar cells caused by spinodal donor-acceptor demixing , 2017, Nature Communications.

[63]  Johannes Textor,et al.  Complete Graphical Characterization and Construction of Adjustment Sets in Markov Equivalence Classes of Ancestral Graphs , 2016, J. Mach. Learn. Res..

[64]  Matti Hoch,et al.  Advanced Drug Delivery Reviews , 2017 .

[65]  Christoph J. Brabec,et al.  Introducing a New Potential Figure of Merit for Evaluating Microstructure Stability in Photovoltaic Polymer-Fullerene Blends , 2017 .

[66]  C. Brabec,et al.  Suppression of Thermally Induced Fullerene Aggregation in Polyfullerene-Based Multiacceptor Organic Solar Cells. , 2017, ACS applied materials & interfaces.

[67]  Alán Aspuru-Guzik,et al.  MultiDK: A Multiple Descriptor Multiple Kernel Approach for Molecular Discovery and Its Application to Organic Flow Battery Electrolytes , 2017, J. Chem. Inf. Model..

[68]  C. Brabec,et al.  Understanding the correlation and balance between the miscibility and optoelectronic properties of polymer–fullerene solar cells , 2017 .

[69]  Michael J. Keiser,et al.  A simple representation of three-dimensional molecular structure , 2017, bioRxiv.

[70]  D. Agbaba,et al.  Modeling of Hansen's solubility parameters of aripiprazole, ziprasidone, and their impurities: A nonparametric comparison of models for prediction of drug absorption sites , 2018 .

[71]  A. Avdeef Cocrystal Solubility Product Prediction Using an in combo Model and Simulations to Improve Design of Experiments , 2018, Pharmaceutical Research.

[72]  Alán Aspuru-Guzik,et al.  ChemOS: Orchestrating autonomous experimentation , 2018, Science Robotics.

[73]  Alán Aspuru-Guzik,et al.  Accelerating the discovery of materials for clean energy in the era of smart automation , 2018, Nature Reviews Materials.

[74]  Long Ye,et al.  Miscibility–Function Relations in Organic Solar Cells: Significance of Optimal Miscibility in Relation to Percolation , 2018 .

[75]  Chem. , 2020, Catalysis from A to Z.