SAMPL6 logP challenge: machine learning and quantum mechanical approaches

Two different types of approaches: (a) approaches that combine quantitative structure activity relationships, quantum mechanical electronic structure methods, and machine-learning and, (b) electronic structure vertical solvation approaches, were used to predict the logP coefficients of 11 molecules as part of the SAMPL6 logP blind prediction challenge. Using electronic structures optimized with density functional theory (DFT), several molecular descriptors were calculated for each molecule, including van der Waals areas and volumes, HOMO/LUMO energies, dipole moments, polarizabilities, and electrophilic and nucleophilic superdelocalizabilities. A multilinear regression model and a partial least squares model were used to train a set of 97 molecules. As well, descriptors were generated using the molecular operating environment and used to create additional machine learning models. Electronic structure vertical solvation approaches considered include DFT and the domain-based local pair natural orbital methods combined with the solvated variant of the correlation consistent composite approach.

[1]  Thomas F. Miller,et al.  Transferability in Machine Learning for Electronic Structure via the Molecular Orbital Basis. , 2018, Journal of chemical theory and computation.

[2]  David L. Mobley,et al.  Octanol–water partition coefficient measurements for the SAMPL6 blind prediction challenge , 2019, bioRxiv.

[3]  Anthony Nicholls,et al.  The SAMPL2 blind prediction challenge: introduction and overview , 2010, J. Comput. Aided Mol. Des..

[4]  Angela K Wilson,et al.  Domain‐based local pair natural orbital methods within the correlation consistent composite approach , 2019, J. Comput. Chem..

[5]  T. Dunning,et al.  Electron affinities of the first‐row atoms revisited. Systematic basis sets and wave functions , 1992 .

[6]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[7]  Z. H. Alsunaidi,et al.  DFT and ab initio composite methods: Investigation of oxygen fluoride species , 2016 .

[8]  G. Scuseria,et al.  Assessment of the Perdew–Burke–Ernzerhof exchange-correlation functional , 1999 .

[9]  Amanda G. Riojas,et al.  Solv-ccCA: Implicit Solvation and the Correlation Consistent Composite Approach for the Determination of pKa. , 2014, Journal of chemical theory and computation.

[10]  Russ B Altman,et al.  Machine learning in chemoinformatics and drug discovery. , 2018, Drug discovery today.

[11]  Gang Zhang,et al.  Comparison of DFT methods for molecular orbital eigenvalue calculations. , 2007, The journal of physical chemistry. A.

[12]  Frank Neese,et al.  Software update: the ORCA program system, version 4.0 , 2018 .

[13]  Angela K. Wilson,et al.  Do composite methods achieve their target accuracy , 2015 .

[14]  K. N. Reddy,et al.  Molecular properties as descriptors of octanol-water partition coefficients of herbicides , 1996 .

[15]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[16]  Michael H Abraham,et al.  Fast calculation of van der Waals volume as a sum of atomic and bond contributions and its application to drug compounds. , 2003, The Journal of organic chemistry.

[17]  David L. Mobley,et al.  Overview of the SAMPL6 host–guest binding affinity prediction challenge , 2018, bioRxiv.

[18]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[19]  Anthony Nicholls,et al.  The SAMP1 solvation challenge: further lessons regarding the pitfalls of parametrization. , 2009, The journal of physical chemistry. B.

[20]  Caitlin C. Bannan,et al.  SAMPL6 challenge results from $$pK_a$$pKa predictions based on a general Gaussian process model , 2018, J. Comput. Aided Mol. Des..

[21]  Parr,et al.  Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. , 1988, Physical review. B, Condensed matter.

[22]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[23]  Thomas R. Cundari,et al.  Towards the intrinsic error of the correlation consistent Composite Approach (ccCA) , 2009 .

[24]  C. Cramer,et al.  Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. , 2009, The journal of physical chemistry. B.

[25]  A. Klamt,et al.  COSMO : a new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient , 1993 .

[26]  F. Neese,et al.  Efficient, approximate and parallel Hartree–Fock and hybrid DFT calculations. A ‘chain-of-spheres’ algorithm for the Hartree–Fock exchange , 2009 .

[27]  Kameron R. Jorgensen,et al.  Enthalpies of formation for organosulfur compounds: Atomization energy and hypohomodesmotic reaction schemes via ab initio composite methods , 2012 .

[28]  Kimito Funatsu,et al.  Structure Modification toward Applicability Domain of a QSAR/QSPR Model Considering Activity/Property , 2017, Molecular informatics.

[29]  S. Grimme,et al.  A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. , 2010, The Journal of chemical physics.

[30]  Frank Neese,et al.  Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. I. An efficient and simple linear scaling local MP2 method that uses an intermediate basis of pair natural orbitals. , 2015, The Journal of chemical physics.

[31]  Christian Mazza,et al.  Statistical significance of quantitative PCR , 2007, BMC Bioinformatics.

[32]  J. Sangster,et al.  Octanol‐Water Partition Coefficients of Simple Organic Compounds , 1989 .

[33]  Nathan J DeYonker,et al.  The correlation consistent composite approach (ccCA): an alternative to the Gaussian-n methods. , 2006, The Journal of chemical physics.

[34]  Chartchalerm Isarankura-Na-Ayudhya,et al.  A practical overview of quantitative structure-activity relationship , 2009 .

[35]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[36]  S. Yousefinejad,et al.  Chemometrics tools in QSAR/QSPR studies: A historical perspective , 2015 .

[37]  Bernard R Brooks,et al.  Partition coefficients for the SAMPL5 challenge using transfer free energies , 2016, Journal of Computer-Aided Molecular Design.

[38]  Nathan J DeYonker,et al.  Toward accurate theoretical thermochemistry of first row transition metal complexes. , 2012, The journal of physical chemistry. A.

[39]  Palanisamy Thanikaivelan,et al.  Application of quantum chemical descriptor in quantitative structure activity and structure property relationship , 2000 .

[40]  Márcia M. C. Ferreira,et al.  QSPR models of boiling point, octanol–water partition coefficient and retention time index of polycyclic aromatic hydrocarbons , 2003 .

[41]  Nathan J DeYonker,et al.  A pseudopotential-based composite method: the relativistic pseudopotential correlation consistent composite approach for molecules containing 4d transition metals (Y-Cd). , 2011, The Journal of chemical physics.

[42]  Frank Neese,et al.  Natural triple excitations in local coupled cluster calculations with pair natural orbitals. , 2013, The Journal of chemical physics.

[43]  A. Becke Density-functional thermochemistry. III. The role of exact exchange , 1993 .

[44]  Stefan M. Kast,et al.  The SAMPL6 challenge on predicting aqueous pKa values from EC-RISM theory , 2018, Journal of Computer-Aided Molecular Design.

[45]  Frank Neese,et al.  SparseMaps-A systematic infrastructure for reduced scaling electronic structure methods. V. Linear scaling explicitly correlated coupled-cluster method with pair natural orbitals. , 2017, The Journal of chemical physics.

[46]  Bernard R. Brooks,et al.  Absolute and relative pKa predictions via a DFT approach applied to the SAMPL6 blind challenge , 2018, Journal of Computer-Aided Molecular Design.

[47]  David L. Mobley,et al.  The SAMPL4 host–guest blind prediction challenge: an overview , 2014, Journal of Computer-Aided Molecular Design.

[48]  Angela K. Wilson,et al.  Gaussian basis sets for use in correlated molecular calculations. X. The atoms aluminum through argon revisited , 2001 .

[49]  Matthew T. Geballe,et al.  The SAMPL3 blind prediction challenge: transfer energy overview , 2012, Journal of Computer-Aided Molecular Design.

[50]  Frank Neese,et al.  Sparse maps--A systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory. , 2016, The Journal of chemical physics.

[51]  Wang,et al.  Accurate and simple analytic representation of the electron-gas correlation energy. , 1992, Physical review. B, Condensed matter.

[52]  David L. Mobley,et al.  SAMPL6 challenge results from $$pK_a$$pKa predictions based on a general Gaussian process model , 2018, J. Comput. Aided Mol. Des..

[53]  A. Becke,et al.  Density-functional exchange-energy approximation with correct asymptotic behavior. , 1988, Physical review. A, General physics.

[54]  Burke,et al.  Generalized Gradient Approximation Made Simple. , 1996, Physical review letters.

[55]  Jian Yin,et al.  Overview of the SAMPL5 host–guest challenge: Are we doing better? , 2016, Journal of Computer-Aided Molecular Design.

[56]  Jackson,et al.  Atoms, molecules, solids, and surfaces: Applications of the generalized gradient approximation for exchange and correlation. , 1992, Physical review. B, Condensed matter.