DemQSAR: predicting human volume of distribution and clearance of drugs

In silico methods characterizing molecular compounds with respect to pharmacologically relevant properties can accelerate the identification of new drugs and reduce their development costs. Quantitative structure–activity/-property relationship (QSAR/QSPR) correlate structure and physico-chemical properties of molecular compounds with a specific functional activity/property under study. Typically a large number of molecular features are generated for the compounds. In many cases the number of generated features exceeds the number of molecular compounds with known property values that are available for learning. Machine learning methods tend to overfit the training data in such situations, i.e. the method adjusts to very specific features of the training data, which are not characteristic for the considered property. This problem can be alleviated by diminishing the influence of unimportant, redundant or even misleading features. A better strategy is to eliminate such features completely. Ideally, a molecular property can be described by a small number of features that are chemically interpretable. The purpose of the present contribution is to provide a predictive modeling approach, which combines feature generation, feature selection, model building and control of overtraining into a single application called DemQSAR. DemQSAR is used to predict human volume of distribution (VDss) and human clearance (CL). To control overtraining, quadratic and linear regularization terms were employed. A recursive feature selection approach is used to reduce the number of descriptors. The prediction performance is as good as the best predictions reported in the recent literature. The example presented here demonstrates that DemQSAR can generate a model that uses very few features while maintaining high predictive power. A standalone DemQSAR Java application for model building of any user defined property as well as a web interface for the prediction of human VDss and CL is available on the webpage of DemPRED: http://agknapp.chemie.fu-berlin.de/dempred/.

[1]  Charles C. Persinger,et al.  How to improve R&D productivity: the pharmaceutical industry's grand challenge , 2010, Nature Reviews Drug Discovery.

[2]  Yau Yi Lau,et al.  Development of a novel in vitro model to predict hepatic clearance using fresh, cryopreserved, and sandwich-cultured hepatocytes. , 2002, Drug metabolism and disposition: the biological fate of chemicals.

[3]  Gerta Rücker,et al.  y-Randomization and Its Variants in QSPR/QSAR , 2007, J. Chem. Inf. Model..

[4]  Claire Strain-Damerell,et al.  Evaluation of Recombinant Cytochrome P450 Enzymes as an in Vitro System for Metabolic Clearance Predictions , 2009, Drug Metabolism and Disposition.

[5]  M. Hutter,et al.  In silico prediction of drug properties. , 2009, Current medicinal chemistry.

[6]  Santiago Vilar,et al.  Prediction of passive blood-brain partitioning: straightforward and effective classification models based on in silico derived physicochemical descriptors. , 2010, Journal of molecular graphics & modelling.

[7]  R. Shader,et al.  Burger's Medicinal Chemistry and Drug Discovery: , 1995 .

[8]  Robert J Kavlock,et al.  Incorporating human dosimetry and exposure into high-throughput in vitro toxicity screening. , 2010, Toxicological sciences : an official journal of the Society of Toxicology.

[9]  M Pastor,et al.  VolSurf: a new tool for the pharmacokinetic optimization of lead compounds. , 2000, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[10]  Santiago Vilar,et al.  Medicinal chemistry and the molecular operating environment (MOE): application of QSAR and molecular docking to drug discovery. , 2008, Current topics in medicinal chemistry.

[11]  J. Dearden,et al.  How not to develop a quantitative structure–activity or structure–property relationship (QSAR/QSPR) , 2009, SAR and QSAR in environmental research.

[12]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[13]  Franco Lombardo,et al.  In silico prediction of volume of distribution in human using linear and nonlinear models on a 669 compound data set. , 2009, Journal of medicinal chemistry.

[14]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[15]  Tudor I. Oprea,et al.  An automated PLS search for biologically relevant QSAR descriptors , 2004, J. Comput. Aided Mol. Des..

[16]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[17]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[18]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  Sean Ekins,et al.  Using Open Source Computational Tools for Predicting Human Metabolic Stability and Additional Absorption, Distribution, Metabolism, Excretion, and Toxicity Properties , 2010, Drug Metabolism and Disposition.

[21]  Jonathan D. Hirst,et al.  TMACC: Interpretable Correlation Descriptors for Quantitative Structure-Activity Relationships , 2007, J. Chem. Inf. Model..

[22]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[23]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[25]  A. Tikhonov On the stability of inverse problems , 1943 .

[26]  Supa Hannongbua,et al.  In-silico ADME models: a general assessment of their utility in drug discovery applications. , 2011, Current topics in medicinal chemistry.

[27]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[28]  Franco Lombardo,et al.  Trend Analysis of a Database of Intravenous Pharmacokinetic Parameters in Humans for 670 Drug Compounds , 2008, Drug Metabolism and Disposition.

[29]  Franco Lombardo,et al.  A hybrid mixture discriminant analysis-random forest computational model for the prediction of volume of distribution of drugs in human. , 2006, Journal of medicinal chemistry.

[30]  R. Obach,et al.  Prediction of human clearance of twenty-nine drugs from hepatic microsomal intrinsic clearance data: An examination of in vitro half-life approach and nonspecific binding to microsomes. , 1999, Drug metabolism and disposition: the biological fate of chemicals.

[31]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[32]  D J Rance,et al.  The prediction of human pharmacokinetic parameters from preclinical and in vitro metabolism data. , 1997, The Journal of pharmacology and experimental therapeutics.

[33]  John P. Overington,et al.  Probing the links between in vitro potency, ADMET and physicochemical parameters , 2011, Nature Reviews Drug Discovery.

[34]  Mario Bertero,et al.  The Stability of Inverse Problems , 1980 .

[35]  I. Kola,et al.  Can the pharmaceutical industry reduce attrition rates? , 2004, Nature Reviews Drug Discovery.

[36]  Jonathan D. Hirst,et al.  Interpretable correlation descriptors for quantitative structure-activity relationships , 2009, J. Cheminformatics.

[37]  Zhi-Wei Cao,et al.  Effect of Selection of Molecular Descriptors on the Prediction of Blood-Brain Barrier Penetrating and Nonpenetrating Agents by Statistical Learning Methods , 2005, J. Chem. Inf. Model..

[38]  Yvan Saeys,et al.  Java-ML: A Machine Learning Library , 2009, J. Mach. Learn. Res..

[39]  H. Yu,et al.  Discovering compact and highly discriminative features or combinations of drug activities using support vector machines , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[40]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[41]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[42]  D. Hoekman Exploring QSAR Fundamentals and Applications in Chemistry and Biology, Volume 1. Hydrophobic, Electronic and Steric Constants, Volume 2 J. Am. Chem. Soc. 1995, 117, 9782 , 1996 .

[43]  Z R Li,et al.  MODEL—molecular descriptor lab: A web‐based server for computing structural and physicochemical features of compounds , 2007, Biotechnology and bioengineering.

[44]  Melvin J. Yu Predicting Total Clearance in Humans from Chemical Structure , 2010, J. Chem. Inf. Model..