Genetic Programming and Other Machine Learning Approaches to Predict Median Oral Lethal Dose (LD50) and Plasma Protein Binding Levels (%PPB) of Drugs

Computational methods allowing reliable pharmacokinetics predictions for newly synthesized compounds are critically relevant for drug discovery and development. Here we present an empirical study focusing on various versions of Genetic Programming and other well known Machine Learning techniques to predict Median Oral Lethal Dose (LD50) and Plasma Protein Binding (%PPB) levels. Since these two parameters respectively characterize the harmful effects and the distribution into human body of a drug, their accurate prediction is essential for the selection of effective molecules. The obtained results confirm that Genetic Programming is a promising technique for predicting pharmacokinetics parameters, both from the point of view of the accurateness and of the generalization ability.

[1]  J. Lavandera,et al.  Cheminformatic models to predict binding affinities to human serum albumin. , 2001, Journal of medicinal chemistry.

[2]  I. Kola,et al.  Can the pharmaceutical industry reduce attrition rates? , 2004, Nature Reviews Drug Discovery.

[3]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[4]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[5]  Valerie J Gillet,et al.  Multiobjective optimization in quantitative structure-activity relationships: deriving accurate and interpretable QSARs. , 2002, Journal of medicinal chemistry.

[6]  T M Martin,et al.  Prediction of the acute toxicity (96-h LC50) of organic compounds to the fathead minnow (Pimephales promelas) using a group contribution method. , 2001, Chemical research in toxicology.

[7]  Yuanyuan Wang,et al.  Predictive Toxicology: Benchmarking Molecular Descriptors and Statistical Methods , 2003, J. Chem. Inf. Comput. Sci..

[8]  Ulf Norinder,et al.  Prediction of ADMET Properties , 2006, ChemMedChem.

[9]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[10]  Leonardo Vanneschi,et al.  Genetic programming for human oral bioavailability of drugs , 2006, GECCO.

[11]  Igor V. Tetko,et al.  Virtual Computational Chemistry Laboratory – Design and Description , 2005, J. Comput. Aided Mol. Des..

[12]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[13]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[14]  T. Kennedy Managing the drug discovery/development interface , 1997 .

[15]  Johann Gasteiger,et al.  Neural networks in chemistry and drug design , 1999 .

[16]  William B. Langdon,et al.  Genetic Programming in Data Mining for Drug Discovery , 2005 .

[17]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[18]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[19]  Maarten Keijzer,et al.  Improving Symbolic Regression with Interval Arithmetic and Linear Scaling , 2003, EuroGP.

[20]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[21]  Zheng Rong Yang,et al.  Evaluation of Mutual Information and Genetic Programming for Feature Selection in QSAR , 2004, J. Chem. Inf. Model..

[22]  Han van de Waterbeemd,et al.  Lipophilicity in PK design: methyl, ethyl, futile , 2001, J. Comput. Aided Mol. Des..

[23]  John R. Koza,et al.  Genetic programming (videotape): the movie , 1992 .

[24]  L. Berezhkovskiy,et al.  Determination of Drug Binding to Plasma Proteins Using Competitive Equilibrium Binding to Dextran-Coated Charcoal , 2006, Journal of Pharmacokinetics and Pharmacodynamics.

[25]  H. van de Waterbeemd,et al.  ADMET in silico modelling: towards prediction paradise? , 2003, Nature reviews. Drug discovery.