Holistic Prediction of pKa in Diverse Solvents Based on Machine Learning Approach.

Although numerous theoretic approaches have been developed for predicting aqueous p K a , fast and accurate prediction of non-aqueous p K a s has remained a major challenge. On the basis of iBonD experimental p K a database curated across 39 solvents, a holistic p K a prediction model was established by using machine learning approach. Structural and physical organic parameters combined descriptors (SPOC) were introduced to represent the electronic and structural features of molecules. With SPOC and ionic status labelling (ISL), the holistic models trained with neural network or XGBoost algorithm showed the best prediction performance with MAE value as low as 0.87 p K a unit. The capability of prediction in diverse solvents allows for a comprehensive mapping of all the possible p K a correlations between different solvents, verifying the existence of transfer learning features . The holistic model was validated by prediction of aqueous p K a and micro-p K a of pharmaceutical molecules and p K a s of organocatalysts in DMSO and MeCN with high accuracy. An on-line prediction platform ( http://pka.luoszgroup.com ) was constructed based on the current model, which could provide p K a prediction beyond the reach otherwise for different types of X-H acidity in the most commonly used solvents.

[1]  J. Donoso,et al.  Theoretical pKa calculations with continuum model solvents, alternative protocols to thermodynamic cycles , 2014 .

[2]  J. Sales,et al.  QSPR Prediction of pK for Aliphatic Carboxylic Acids and Anilines in Different Solvents , 2008 .

[3]  B. Cox Acids and Bases: Solvent Effects on Acid-Base Strength , 2013 .

[4]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[5]  Bogdan I. Iorga,et al.  SAMPL6: calculation of macroscopic pKa values from ab initio quantum mechanical free energies , 2018, Journal of Computer-Aided Molecular Design.

[6]  T. Schultz,et al.  The use of the ionization constant (pKa) in selecting models of toxicity in phenols. , 1987, Ecotoxicology and environmental safety.

[7]  I. Tetko,et al.  Predicting the pKa of Small Molecules , 2011 .

[8]  H. Mayr,et al.  Basicities and Nucleophilicities of Pyrrolidines and Imidazolidinones Used as Organocatalysts. , 2020, Journal of the American Chemical Society.

[9]  Huaijin Zhang,et al.  Prediction of pK(a) values of neutral and alkaline drugs with particle swarm optimization algorithm and artificial neural network , 2019, Neural Computing and Applications.

[10]  Bernard R. Brooks,et al.  An explicit-solvent hybrid QM and MM approach for predicting pKa of small molecules in SAMPL6 challenge , 2018, Journal of Computer-Aided Molecular Design.

[11]  David L. Mobley,et al.  SAMPL6 challenge results from $$pK_a$$pKa predictions based on a general Gaussian process model , 2018, J. Comput. Aided Mol. Des..

[12]  Jin‐Pei Cheng,et al.  Recent Advances and Advisable Applications of Bond Energetics in Organic Chemistry. , 2018, Journal of the American Chemical Society.

[13]  E. Knapp,et al.  Computing pKa Values in Different Solvents by Electrostatic Transformation. , 2016, Journal of chemical theory and computation.

[14]  Stefan M. Kast,et al.  The SAMPL6 challenge on predicting aqueous pKa values from EC-RISM theory , 2018, Journal of Computer-Aided Molecular Design.

[15]  Manfred Kansy,et al.  Extending pKa prediction accuracy: high-throughput pKa measurements to understand pKa modulation of new chemical series. , 2010, European journal of medicinal chemistry.

[16]  Paul L. A. Popelier,et al.  pKa Prediction from "Quantum Chemical Topology" Descriptors , 2009, J. Chem. Inf. Model..

[17]  Emanuele Rossini,et al.  Proton solvation in protic and aprotic solvents , 2016, J. Comput. Chem..

[18]  Rafiqul Gani,et al.  Prediction of acid dissociation constants of organic compounds using group contribution methods , 2018, Chemical Engineering Science.

[19]  D. Manallack The acid–base profile of a contemporary set of drugs: implications for drug discovery , 2009, SAR and QSAR in environmental research.

[20]  B. Grzybowski,et al.  Rapid and Accurate Prediction of pKa Values of C-H Acids Using Graph Convolutional Neural Networks. , 2019, Journal of the American Chemical Society.

[21]  Antony J. Williams,et al.  Open-source QSAR models for pKa prediction using multiple machine learning approaches , 2019, Journal of Cheminformatics.

[22]  Jonathan H Skone,et al.  Toward the accurate calculation of pKa values in water and acetonitrile. , 2013, Biochimica et biophysica acta.

[23]  V. Machado,et al.  Anionic chromogenic chemosensors highly selective for fluoride or cyanide based on 4-(4-Nitrobenzylideneamine)phenol , 2012 .

[24]  Motoo Yasuda,et al.  Dissociation Constants of Some Carboxylic Acids in Mixed Aqueous Solvents , 1959 .

[25]  D. Livingstone Theoretical property predictions. , 2003, Current topics in medicinal chemistry.

[26]  Tudor I. Oprea,et al.  The significance of acid/base properties in drug discovery. , 2013, Chemical Society reviews.

[27]  Ramaswamy Nilakantan,et al.  Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors , 1987, J. Chem. Inf. Comput. Sci..

[28]  First‐principles prediction of acidities in the gas and solution phase , 2011 .

[29]  Jin‐Pei Cheng,et al.  The Essential Role of Bond Energetics in C-H Activation/Functionalization. , 2017, Chemical reviews.

[30]  Stefan Grimme,et al.  High accuracy quantum-chemistry-based calculation and blind prediction of macroscopic pKa values in the context of the SAMPL6 challenge , 2018, Journal of Computer-Aided Molecular Design.

[31]  Jing Xue,et al.  Understanding the role of thermodynamics in catalytic imine reductions. , 2019, Chemical Society reviews.

[32]  Xin Li,et al.  Equilibrium acidities of cinchona alkaloid organocatalysts bearing 6′-hydrogen bonding donors in DMSO , 2016 .

[33]  Peter Gedeck,et al.  Prediction of pKa Using Machine Learning Methods with Rooted Topological Torsion Fingerprints: Application to Aliphatic Amines , 2019, J. Chem. Inf. Model..

[34]  P. Jeschke Propesticides and their use as agrochemicals. , 2016, Pest management science.

[35]  J. Sales,et al.  Neural Network Based QSPR Study for Predicting pKa of Phenols in Different Solvents , 2007 .

[36]  P. Seybold,et al.  Computational estimation of pKa values , 2015 .

[37]  Q. Guo,et al.  First-principle predictions of absolute pKa's of organic acids in dimethyl sulfoxide solution. , 2004, Journal of the American Chemical Society.

[38]  W Patrick Walters,et al.  Acidic and basic drugs in medicinal chemistry: a perspective. , 2014, Journal of medicinal chemistry.

[39]  Andreas Klamt,et al.  Accurate prediction of basicity in aqueous solution with COSMO‐RS , 2006, J. Comput. Chem..

[40]  Xin Li,et al.  Equilibrium acidities of proline derived organocatalysts in DMSO. , 2015, Organic letters.

[41]  Andreas Klamt,et al.  First Principles Calculations of Aqueous pKa Values for Organic and Inorganic Acids Using COSMO-RS Reveal an Inconsistency in the Slope of the pKa Scale. , 2003, The journal of physical chemistry. A.

[42]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[43]  Thomas Steinbrecher,et al.  Quantum chemical prediction for complex organic molecules , 2018 .

[44]  Junming Ho Predicting pKa in Implicit Solvents: Current Status and Future Directions* , 2014 .

[45]  Andreas H. Göller,et al.  Best of Both Worlds: Combining Pharma Data and State of the Art Modeling Technology To Improve in Silico pKa Prediction , 2015, J. Chem. Inf. Model..

[46]  Bernard R. Brooks,et al.  Absolute and relative pKa predictions via a DFT approach applied to the SAMPL6 blind challenge , 2018, Journal of Computer-Aided Molecular Design.

[47]  E. Knapp,et al.  Empirical Conversion of pKa Values between Different Solvents and Interpretation of the Parameters: Application to Water, Acetonitrile, Dimethyl Sulfoxide, and Methanol , 2018, ACS omega.

[48]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[49]  G. Shields,et al.  Theoretical Calculations of Acid Dissociation Constants: A Review Article , 2010 .

[50]  K. Jolliffe,et al.  Quantum Chemical Prediction of Equilibrium Acidities of Ureas, Deltamides, Squaramides, and Croconamides. , 2017, The Journal of organic chemistry.

[51]  Robert D Clark Predicting mammalian metabolism and toxicity of pesticides in silico , 2018, Pest management science.

[52]  Tudor I. Oprea,et al.  A Chemogenomic Analysis of Ionization Constants—Implications for Drug Discovery , 2013, ChemMedChem.

[53]  Mark A Watson,et al.  Multiconformation, Density Functional Theory-Based pKa Prediction in Application to Large, Flexible Organic Molecules with Diverse Functional Groups. , 2016, Journal of chemical theory and computation.

[54]  Q. Guo,et al.  What are the pKa values of C–H bonds in aromatic heterocyclic compounds in DMSO? , 2007 .

[55]  Xin Li,et al.  Squaramide equilibrium acidities in DMSO. , 2014, Organic letters.

[56]  Xin Li,et al.  Equilibrium acidities of BINOL type chiral phenolic hydrogen bonding donors in DMSO , 2016 .

[57]  W. Guida,et al.  Accurate Prediction of Acidity Constants in Aqueous Solution via Density Functional Theory and Self-Consistent Reaction Field Methods , 2002 .

[58]  H. Deng,et al.  Physical organic study of structure-activity-enantioselectivity relationships in asymmetric bifunctional thiourea catalysis: hints for the design of new organocatalysts. , 2010, Chemistry.