Machine learning meets pK a

We present a small molecule pK a prediction tool entirely written in Python. It predicts the macroscopic pK a value and is trained on a literature compilation of monoprotic compounds. Different machine learning models were tested and random forest performed best given a five-fold cross-validation (mean absolute error=0.682, root mean squared error=1.032, correlation coefficient r 2 =0.82). We test our model on two external validation sets, where our model performs comparable to Marvin and is better than a recently published open source model. Our Python tool and all data is freely available at https://github.com/czodrowskilab/Machine-learning-meets-pKa.

[1]  Tudor I. Oprea,et al.  The significance of acid/base properties in drug discovery. , 2013, Chemical Society reviews.

[2]  Ronald M. A. Knegtel,et al.  Comparison of the Accuracy of Experimental and Predicted pKa Values of Basic and Acidic Compounds , 2013, Pharmaceutical Research.

[3]  Marc C. Nicklaus,et al.  Comparison of Nine Programs Predicting pKa Values of Pharmaceutical Substances , 2009, J. Chem. Inf. Model..

[4]  B. Grzybowski,et al.  Rapid and Accurate Prediction of pKa Values of C-H Acids Using Graph Convolutional Neural Networks. , 2019, Journal of the American Chemical Society.

[5]  Jetse Reijenga,et al.  Development of Methods for the Determination of pKa Values , 2013, Analytical chemistry insights.

[6]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[7]  Antony J. Williams,et al.  Open-source QSAR models for pKa prediction using multiple machine learning approaches , 2019, Journal of Cheminformatics.

[8]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[9]  Thomas Sander,et al.  DataWarrior: An Open-Source Program For Chemistry Aware Data Visualization And Analysis , 2015, J. Chem. Inf. Model..

[10]  N. Meanwell Improving drug candidates by design: a focus on physicochemical properties as a means of improving compound disposition and safety. , 2011, Chemical research in toxicology.

[11]  Elizabeth Yuriev,et al.  The influence and manipulation of acid/base properties in drug discovery. , 2018, Drug discovery today. Technologies.

[12]  Jeremy R. Greenwood,et al.  Epik: a software program for pKa prediction and protonation state generation for drug-like molecules , 2007, J. Comput. Aided Mol. Des..

[13]  W Patrick Walters,et al.  Acidic and basic drugs in medicinal chemistry: a perspective. , 2014, Journal of medicinal chemistry.

[14]  Andreas H. Göller,et al.  Best of Both Worlds: Combining Pharma Data and State of the Art Modeling Technology To Improve in Silico pKa Prediction , 2015, J. Chem. Inf. Model..

[15]  Manfred Kansy,et al.  Predicting and Tuning Physicochemical Properties in Lead Optimization: Amine Basicities , 2007, ChemMedChem.

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Mark A Watson,et al.  Multiconformation, Density Functional Theory-Based pKa Prediction in Application to Large, Flexible Organic Molecules with Diverse Functional Groups. , 2016, Journal of chemical theory and computation.

[18]  Christophe Dardonville,et al.  Automated techniques in pKa determination: Low, medium and high-throughput screening methods. , 2018, Drug discovery today. Technologies.

[19]  Loriano Storchi,et al.  New and Original pKa Prediction Method Using Grid Molecular Interaction Fields , 2007, J. Chem. Inf. Model..

[20]  M. Gleeson Generation of a set of simple, interpretable ADMET rules of thumb. , 2008, Journal of medicinal chemistry.

[21]  Zhide Hu,et al.  Prediction of pKa for Neutral and Basic Drugs Based on Radial Basis Function Neural Networks and the Heuristic Method , 2005, Pharmaceutical Research.

[22]  Paul D. Leeson,et al.  Impact of ion class and time on oral drug molecular properties , 2011 .