Best of Both Worlds: Combining Pharma Data and State of the Art Modeling Technology To Improve in Silico pKa Prediction

In a unique collaboration between a software company and a pharmaceutical company, we were able to develop a new in silico pKa prediction tool with outstanding prediction quality. An existing pKa prediction method from Simulations Plus based on artificial neural network ensembles (ANNE), microstates analysis, and literature data was retrained with a large homogeneous data set of drug-like molecules from Bayer. The new model was thus built with curated sets of ∼14,000 literature pKa values (∼11,000 compounds, representing literature chemical space) and ∼19,500 pKa values experimentally determined at Bayer Pharma (∼16,000 compounds, representing industry chemical space). Model validation was performed with several test sets consisting of a total of ∼31,000 new pKa values measured at Bayer. For the largest and most difficult test set with >16,000 pKa values that were not used for training, the original model achieved a mean absolute error (MAE) of 0.72, root-mean-square error (RMSE) of 0.94, and squared correlation coefficient (R(2)) of 0.87. The new model achieves significantly improved prediction statistics, with MAE = 0.50, RMSE = 0.67, and R(2) = 0.93. It is commercially available as part of the Simulations Plus ADMET Predictor release 7.0. Good predictions are only of value when delivered effectively to those who can use them. The new pKa prediction model has been integrated into Pipeline Pilot and the PharmacophorInformatics (PIx) platform used by scientists at Bayer Pharma. Different output formats allow customized application by medicinal chemists, physical chemists, and computational chemists.

[1]  M. Wiese,et al.  Microionization constants: novel approach for the determination of the zwitterionic equilibrium of hydroxyphenylalkylamines by photometric titration. , 2001, International journal of pharmaceutics.

[2]  Johann Gasteiger,et al.  Neural networks in chemistry and drug design , 1999 .

[3]  Marc C. Nicklaus,et al.  Comparison of Nine Programs Predicting pKa Values of Pharmaceutical Substances , 2009, J. Chem. Inf. Model..

[4]  Béla Noszál,et al.  Protonation microequilibrium treatment of polybasic compounds with any possible symmetry , 1999 .

[5]  P. Seybold,et al.  Computational Approaches for the Prediction of pKa Values , 2013 .

[6]  Michal Borkovec,et al.  Resolution of Microscopic Protonation Mechanisms in Polyprotic Molecules , 2002 .

[7]  Suzanne Skolnik,et al.  Recent Advances in Physicochemical and ADMET Profiling in Drug Discovery , 2009, Chemistry & biodiversity.

[8]  Alexander Hillisch,et al.  Improving the hit-to-lead process: data-driven assessment of drug-like and lead-like screening hits. , 2006, Drug discovery today.

[9]  R. Fraczkiewicz,et al.  In Silico Prediction of Ionization , 2007 .

[10]  Zsuzsanna Kovács,et al.  Triprotic acid-base microequilibria and pharmacokinetic sequelae of cetirizine. , 2009, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[11]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[12]  E. B. Whipple,et al.  The Protonation of Pyrroles , 1963 .

[13]  K. Tam,et al.  Multiwavelength Spectrophotometric Resolution of the Micro-Equilibria of Cetirizine , 2001, Analytical sciences : the international journal of the Japan Society for Analytical Chemistry.

[14]  T Scior,et al.  How to recognize and workaround pitfalls in QSAR studies: a critical review. , 2009, Current medicinal chemistry.

[15]  Klaus-Robert Müller,et al.  Benchmark Data Set for in Silico Prediction of Ames Mutagenicity , 2009, J. Chem. Inf. Model..

[16]  Béla Noszál,et al.  Determination of microscopic acid–base parameters from NMR–pH titrations , 2004, Analytical and bioanalytical chemistry.

[17]  Bernard Spiess,et al.  Complete Resolution of the Microscopic Protonation Equilibria of D-myo-Inositol 1,2,6-Tris(phosphate) and Related Compounds by 31P NMR and Potentiometry , 1995 .

[18]  Britta Nisius,et al.  Similarity-Based Classifier Using Topomers to Provide a Knowledge Base for hERG Channel Inhibition , 2009, J. Chem. Inf. Model..

[19]  Johan Ulander,et al.  High-throughput pKa screening and prediction amenable for ADME profiling , 2006, Expert opinion on drug metabolism & toxicology.

[20]  I. Tetko,et al.  Predicting the pKa of Small Molecules , 2011 .

[21]  M. Gleeson Generation of a set of simple, interpretable ADMET rules of thumb. , 2008, Journal of medicinal chemistry.

[22]  Stephen R. Johnson,et al.  The Trouble with QSAR (or How I Learned To Stop Worrying and Embrace Fallacy) , 2008, J. Chem. Inf. Model..

[23]  Manfred Kansy,et al.  Extending pKa prediction accuracy: high-throughput pKa measurements to understand pKa modulation of new chemical series. , 2010, European journal of medicinal chemistry.

[24]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[25]  Arthur M. Doweyko,et al.  QSAR: dead or alive? , 2008, J. Comput. Aided Mol. Des..

[26]  György M. Keserű,et al.  Comparative Evaluation of in Silico pKa Prediction Tools on the Gold Standard Dataset , 2009 .

[27]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[28]  R. Mannhold,et al.  Calculation of molecular lipophilicity: state of the art and comparison of methods on more than 96000 compounds , 2009, Journal of pharmaceutical sciences.

[29]  Arthur M Doweyko Is QSAR relevant to drug discovery? , 2008, IDrugs : the investigational drugs journal.

[30]  R. L. Hinman,et al.  The Protonation of Indoles: Position of Protonation , 1962 .

[31]  R. J. Abraham,et al.  PHYSICAL PROPERTIES OF ALKYL PYRROLES AND THEIR SALTS , 1959 .

[32]  Alexander Hillisch,et al.  In Silico ADMET Traffic Lights as a Tool for the Prioritization of HTS Hits , 2006, ChemMedChem.

[33]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[34]  J. Dearden,et al.  How not to develop a quantitative structure–activity or structure–property relationship (QSAR/QSPR) , 2009, SAR and QSAR in environmental research.

[35]  John Manchester,et al.  Evaluation of pKa Estimation Methods on 211 Druglike Compounds , 2010, J. Chem. Inf. Model..

[36]  Jürgen Bajorath,et al.  Combining Cluster Analysis, Feature Selection and Multiple Support Vector Machine Models for the Identification of Human Ether‐a‐go‐go Related Gene Channel Blocking Compounds , 2009, Chemical biology & drug design.

[37]  Ronald M. A. Knegtel,et al.  Comparison of the Accuracy of Experimental and Predicted pKa Values of Basic and Acidic Compounds , 2013, Pharmaceutical Research.

[38]  Kin Yip Tam Multiwavelength Spectrophotometric Resolution of the Micro-Equilibria of a Triprotic Amphoteric Drug: Methacycline , 2000 .

[39]  Akos Tarcsay,et al.  Comparative evaluation of pK(a) prediction tools on a drug discovery dataset. , 2012, Journal of pharmaceutical and biomedical analysis.

[40]  K. Takács-Novák,et al.  Multiwavelength Spectrophotometric Determination of Acid Dissociation Constants: Part II. First Derivative vs. Target Factor Analysis , 1999, Pharmaceutical Research.

[41]  Kin Yip Tam,et al.  Multi-wavelength spectrophotometric determination of acid dissociation constants: a validation study , 2001 .

[42]  Loriano Storchi,et al.  New and Original pKa Prediction Method Using Grid Molecular Interaction Fields , 2007, J. Chem. Inf. Model..

[43]  Aalt Bast,et al.  Comprehensive medicinal chemistry , 1991 .

[44]  K. Y. Tarn,et al.  Multiwavelength Spectrophotometric Determination of Acid Dissociation Constants. Deconvolution of Binary Mixtures of Ionizable Compounds , 2000 .

[45]  Timothy Clark,et al.  In Silico Prediction of Buffer Solubility Based on Quantum-Mechanical and HQSAR- and Topology-Based Descriptors , 2006, J. Chem. Inf. Model..

[46]  Timothy Clark,et al.  CypScore: Quantitative Prediction of Reactivity toward Cytochromes P450 Based on Semiempirical Molecular Orbital Theory , 2009, ChemMedChem.