Predicting Lipophilicity of Drug‐Discovery Molecules using Gaussian Process Models

Many drug failures are due to an unfavorable ADMET profile (Absorption, Distribution, Metabolism, Excretion & Toxicity). Lipophilicity is intimately connected with ADMET and in today’s drug discovery process, the octanol water partition coefficient log P and it’s pH dependant counterpart log D have to be taken into account early on in lead discovery. Commercial tools available for ’in silico’ prediction of ADMET or lipophilicity parameters usually have been trained on relatively small and mostly neutral molecules, therefore their accuracy on industrial in-house data leaves room for considerable improvement (see Bruneau et al. and references therein). Using modern kernel-based machine learning algorithms – so called Gaussian Processes (GP)– this study constructs different log P and log D7 models that exhibit excellent predictions which compare favorably to state-of-the-art tools on both benchmark and in-house data sets.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[3]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[4]  Frank R. Burden,et al.  Quantitative Structure-Activity Relationship Studies Using Gaussian Processes , 2001, J. Chem. Inf. Comput. Sci..

[5]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[6]  D P Enot,et al.  Gaussian Process: An Efficient Technique to Solve Quantitative Structure-Property Relationship Problems , 2001, SAR and QSAR in environmental research.

[7]  Pierre Bruneau,et al.  Search for Predictive Generic Model of Aqueous Solubility Using Bayesian Neural Nets , 2001, J. Chem. Inf. Comput. Sci..

[8]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[9]  Peter Tiño,et al.  Nonlinear Prediction of Quantitative Structure-Activity Relationships , 2004, J. Chem. Inf. Model..

[10]  Gunnar Rätsch,et al.  Classifying 'Drug-likeness' with Kernel-Based Learning Methods , 2005, J. Chem. Inf. Model..

[11]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[12]  Pierre Bruneau,et al.  logD7.4 Modeling Using Bayesian Regularized Neural Networks. Assessment and Correction of the Errors of Prediction , 2006, J. Chem. Inf. Model..

[13]  Klaus-Robert Müller,et al.  Accurate Solubility Prediction with Error Bars for Electrolytes: A Machine Learning Approach , 2007, J. Chem. Inf. Model..