Structure-activity models of oral clearance, cytotoxicity, and LD50: a screen for promising anticancer compounds

BackgroundQuantitative structure-activity relationship (QSAR) models have become popular tools to help identify promising lead compounds in anticancer drug development. Few QSAR studies have investigated multitask learning, however. Multitask learning is an approach that allows distinct but related data sets to be used in training. In this paper, a suite of three QSAR models is developed to identify compounds that are likely to (a) exhibit cytotoxic behavior against cancer cells, (b) exhibit high rat LD50 values (low systemic toxicity), and (c) exhibit low to modest human oral clearance (favorable pharmacokinetic characteristics). Models were constructed using Kernel Multitask Latent Analysis (KMLA), an approach that can effectively handle a large number of correlated data features, nonlinear relationships between features and responses, and multitask learning. Multitask learning is particularly useful when the number of available training records is small relative to the number of features, as was the case with the oral clearance data.ResultsMultitask learning modestly but significantly improved the classification precision for the oral clearance model. For the cytotoxicity model, which was constructed using a large number of records, multitask learning did not affect precision but did reduce computation time. The models developed here were used to predict activities for 115,000 natural compounds. Hundreds of natural compounds, particularly in the anthraquinone and flavonoids groups, were predicted to be cytotoxic, have high LD50 values, and have low to moderate oral clearance.ConclusionMultitask learning can be useful in some QSAR models. A suite of QSAR models was constructed and used to screen a large drug library for compounds likely to be cytotoxic to multiple cancer cell lines in vitro, have low systemic toxicity in rats, and have favorable pharmacokinetic properties in humans.

[1]  Wang Guilian,et al.  Structure-activity relationships for rat and mouse LD50 of miscellaneous alcohols , 1998 .

[2]  K. Bennett,et al.  Inductive Transfer using Kernel Multitask Latent Analysis , 2005 .

[3]  Tudor I. Oprea,et al.  Toward minimalistic modeling of oral drug absorption. , 1999, Journal of molecular graphics & modelling.

[4]  Marco Pintore,et al.  Prediction of oral bioavailability by adaptive fuzzy partitioning. , 2003, European journal of medicinal chemistry.

[5]  Joseph V. Turner,et al.  Bioavailability Prediction Based on Molecular Structure for a Diverse Series of Drugs , 2004, Pharmaceutical Research.

[6]  J. Dyer,et al.  Clinical Pharmacology of 1,4‐Butanediol and Gamma‐hydroxybutyrate After Oral 1,4‐Butanediol Administration to Healthy Volunteers , 2007, Clinical pharmacology and therapeutics.

[7]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[8]  I. Juranić,et al.  A QSAR study of acute toxicity of N-substituted fluoroacetamides to rats. , 2006, Chemosphere.

[9]  S. Wold,et al.  A PLS kernel algorithm for data sets with many variables and few objects. Part II: Cross‐validation, missing data and examples , 1995 .

[10]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[11]  Maykel Pérez González,et al.  A topological sub-structural approach for predicting human intestinal absorption of drugs. , 2004, European journal of medicinal chemistry.

[12]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[13]  R L Nation,et al.  Prediction of drug absorption based on immobilized artificial membrane (IAM) chromatography separation and calculated molecular descriptors. , 2005, Journal of pharmaceutical and biomedical analysis.

[14]  Y. Yano,et al.  Prediction of human pharmacokinetics from animal data and molecular structural parameters using multivariate regression analysis: oral clearance. , 2003, Journal of pharmaceutical sciences.

[15]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[16]  J. Topliss,et al.  QSAR model for drug human oral bioavailability. , 2000, Journal of medicinal chemistry.

[17]  Lawrence X. Yu,et al.  Predicting Human Oral Bioavailability of a Compound: Development of a Novel Quantitative Structure-Bioavailability Relationship , 2000, Pharmaceutical Research.

[18]  Peter C. Jurs,et al.  Prediction of Human Intestinal Absorption of Drug Compounds from Molecular Structure , 1998, J. Chem. Inf. Comput. Sci..

[19]  J. Hermens,et al.  Computer-modeling-based QSARs for analyzing experimental data on biotransformation and toxicity. , 2001, Toxicology in vitro : an international journal published in association with BIBRA.

[20]  Nathan Brown,et al.  Exploiting QSAR methods in lead optimization. , 2006, Current opinion in drug discovery & development.

[21]  A K Saxena,et al.  Comparison of MLR, PLS and GA-MLR in QSAR analysis* , 2003, SAR and QSAR in environmental research.

[22]  G. Lou,et al.  Bioavailability and pharmacokinetic disposition of tacrine in elderly patients with Alzheimer's disease. , 1996, Journal of psychiatry & neuroscience : JPN.

[23]  Douglas M. Hawkins,et al.  QSAR with Few Compounds and Many Features , 2001, J. Chem. Inf. Comput. Sci..

[24]  James A. Duke,et al.  Dr. Duke's phytochemical and ethnobotanical databases , 1994 .

[25]  R. J. Lewis,et al.  Regulations, recommendations, and assessments extracted from RTECS. A subfile of the registry of toxic effects of chemical substances , 1986 .

[26]  R. Linke,et al.  Pharmacokinetics of Meloxicam in Patients With Juvenile Rheumatoid Arthritis , 2004, Journal of clinical pharmacology.

[27]  J. Howard Petrie,et al.  Analysis of the Registry of Toxic Effects of Chemical Substances (RTECS) Files and Conversion of the Data in These Files for Input to the Environmental Chemicals Data and Information Network (ECDIN) , 1978, J. Chem. Inf. Comput. Sci..

[28]  Judith C Madden,et al.  Structure-based methods for the prediction of drug metabolism , 2006, Expert opinion on drug metabolism & toxicology.

[29]  Wei Deng,et al.  Predicting Protein-Ligand Binding Affinities Using Novel Geometrical Descriptors and Machine-Learning Methods , 2004, J. Chem. Inf. Model..

[30]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[31]  Andrew Worth,et al.  The Registry of Cytotoxicity: Toxicity Testing in Cell Cultures to Predict Acute Toxicity (LD50) and to Reduce Testing in Animals 1 , 2003, Alternatives to laboratory animals : ATLA.

[32]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[33]  G Beck,et al.  Evaluation of human intestinal absorption data and subsequent derivation of a quantitative structure-activity relationship (QSAR) with the Abraham descriptors. , 2001, Journal of pharmaceutical sciences.

[34]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[35]  Volker Tresp,et al.  Learning to learn and collaborative filtering , 2005, NIPS 2005.

[36]  A. Boulesteix Statistical Applications in Genetics and Molecular Biology PLS Dimension Reduction for Classification with Microarray Data , 2011 .

[37]  P. Buchwald General linearized biexponential model for QSAR data showing bilinear-type distribution. , 2005, Journal of pharmaceutical sciences.

[38]  Tingjun Hou,et al.  ADME evaluation in drug discovery , 2002, Journal of molecular modeling.

[39]  W. Lingk,et al.  Intercomparison study on the determination of single administration toxicity in rats. , 1979, Journal - Association of Official Analytical Chemists.

[40]  J. Barrett,et al.  Moricizine Bioavailability via Simultaneous, Dual, Stable Isotope Administration: Bioequivalence Implications , 1999, Journal of clinical pharmacology.

[41]  Tomoko Niwa,et al.  Using General Regression and Probabilistic Neural Networks To Predict Human Intestinal Absorption with Topological Descriptors Derived from Two-Dimensional Chemical Structures , 2003, J. Chem. Inf. Comput. Sci..

[42]  L. Cong,et al.  Artemisinin pharmacokinetics in healthy adults after 250, 500 and 1000 mg single oral doses , 1998, Biopharmaceutics & drug disposition.

[43]  Gersende Fort,et al.  Classification using partial least squares with penalized logistic regression , 2005, Bioinform..

[44]  A. Höskuldsson PLS regression methods , 1988 .

[45]  Erik Johansson,et al.  Megavariate analysis of environmental QSAR data. Part I – A basic framework founded on principal component analysis (PCA), partial least squares (PLS), and statistical molecular design (SMD) , 2006, Molecular Diversity.

[46]  Remigijus Didziapetris,et al.  Classification structure-activity relations (C-SAR) in prediction of human intestinal absorption. , 2003, Journal of pharmaceutical sciences.

[47]  Tudor I. Oprea,et al.  An automated PLS search for biologically relevant QSAR descriptors , 2004, J. Comput. Aided Mol. Des..

[48]  Anne Hersey,et al.  Rate-Limited Steps of Human Oral Absorption and QSAR Studies , 2002, Pharmaceutical Research.

[49]  Lourdes Santana,et al.  Medicinal chemistry and bioinformatics--current trends in drugs discovery with networks topological indices. , 2007, Current topics in medicinal chemistry.

[50]  D. Zakarya,et al.  Structure-toxicity relationships study of a series of organophosphorus insecticides , 2002, Journal of molecular modeling.

[51]  Philip Howard,et al.  Practical considerations on the use of predictive models for regulatory purposes. , 2005, Environmental science & technology.

[52]  Jerzy Leszczynski,et al.  Structure-toxicity relationships of nitroaromatic compounds , 2006, Molecular Diversity.

[53]  Lemont B. Kier,et al.  New predictors for several ADME/Tox properties: Aqueous solubility, human oral absorption, and Ames genotoxicity using topological descriptors , 2004, Molecular Diversity.

[54]  Tingjun Hou,et al.  ADME Evaluation in Drug Discovery, 6. Can Oral Bioavailability in Humans Be Effectively Predicted by Simple Molecular Property-Based Rules? , 2007, J. Chem. Inf. Model..

[55]  Lawrence Carin,et al.  Learning Multiple Classifiers with Dirichlet Process Mixture Priors , 2005 .

[56]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[57]  Jacob A. Wegelin,et al.  A Survey of Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case , 2000 .

[58]  G Wang,et al.  Structure-activity relationships for rat and mouse LD50 of miscellaneous alcohols. , 1998, Chemosphere.

[59]  Kristin P. Bennett,et al.  Constructing Orthogonal Latent Features for Arbitrary Loss , 2006, Feature Extraction.

[60]  John G. Topliss,et al.  QSAR Model for Drug Human Oral Bioavailability1 , 2000 .

[61]  West Indian Ben,et al.  Dr. Duke's Phytochemical and Ethnobotanical Databases , 2010 .

[62]  R. Newman,et al.  A classification model to predict synergism/antagonism of cytotoxic mixtures using protein-drug docking scores , 2008, BMC pharmacology.

[63]  Jahan B. Ghasemi,et al.  Combination of genetic algorithm and partial least squares for cloud point prediction of nonionic surfactants from molecular structures. , 2007, Annali di chimica.

[64]  Pierre Baldi,et al.  Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity , 2005, ISMB.