Predicting cytotoxicity from heterogeneous data sources with Bayesian learning

BackgroundWe collected data from over 80 different cytotoxicity assays from Pfizer in-house work as well as from public sources and investigated the feasibility of using these datasets, which come from a variety of assay formats (having for instance different measured endpoints, incubation times and cell types) to derive a general cytotoxicity model. Our main aim was to derive a computational model based on this data that can highlight potentially cytotoxic series early in the drug discovery process.ResultsWe developed Bayesian models for each assay using Scitegic FCFP_6 fingerprints together with the default physical property descriptors. Pairs of assays that are mutually predictive were identified by calculating the ROC score of the model derived from one predicting the experimental outcome of the other, and vice versa. The prediction pairs were visualised in a network where nodes are assays and edges are drawn for ROC scores >0.60 in both directions. We observed that, if assay pairs (A, B) and (B, C) were mutually predictive, this was often not the case for the pair (A, C). The results from 48 assays connected to each other were merged in one training set of 145590 compounds and a general cytotoxicity model was derived. The model has been cross-validated as well as being validated with a set of 89 FDA approved drug compounds.ConclusionsWe have generated a predictive model for general cytotoxicity which could speed up the drug discovery process in multiple ways. Firstly, this analysis has shown that the outcomes of different assay formats can be mutually predictive, thus removing the need to submit a potentially toxic compound to multiple assays. Furthermore, this analysis enables selection of (a) the easiest-to-run assay as corporate standard, or (b) the most descriptive panel of assays by including assays whose outcomes are not mutually predictive. The model is no replacement for a cytotoxicity assay but opens the opportunity to be more selective about which compounds are to be submitted to it. On a more mundane level, having data from more than 80 assays in one dataset answers, for the first time, the question - "what are the known cytotoxic compounds from the Pfizer compound collection?" Finally, having a predictive cytotoxicity model will assist the design of new compounds with a desired cytotoxicity profile, since comparison of the model output with data from an in vitro safety/toxicology assay suggests one is predictive of the other.

[1]  Andreas Bender,et al.  Molecular Similarity Searching Using Atom Environments, Information-Based Feature Selection, and a Naïve Bayesian Classifier , 2004, J. Chem. Inf. Model..

[2]  C. Laggner,et al.  Why drugs fail--a study on side effects in new chemical entities. , 2005, Current pharmaceutical design.

[3]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[4]  Jinghai J. Xu In vitro Toxicology: Bringing the In Silico and In Vivo Worlds Closer , 2006 .

[5]  G. V. Paolini,et al.  Global mapping of pharmacological space , 2006, Nature Biotechnology.

[6]  Xiaoyang Xia,et al.  Classification of kinase inhibitors using a Bayesian model. , 2004, Journal of medicinal chemistry.

[7]  S. O'Brien,et al.  Greater than the sum of its parts: combining models for useful ADMET prediction. , 2005, Journal of medicinal chemistry.

[8]  Taosheng Chen,et al.  High throughput screening identified a substituted imidazole as a novel RANK pathway-selective osteoclastogenesis inhibitor. , 2006, Assay and drug development technologies.

[9]  Meir Glick,et al.  Enrichment of Extremely Noisy High-Throughput Screening Data Using a Naïve Bayes Classifier , 2004, Journal of biomolecular screening.

[10]  Rajarshi Guha,et al.  Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays , 2008, J. Comput. Aided Mol. Des..

[11]  David Rogers,et al.  Cheminformatics analysis and learning in a data pipelining environment , 2006, Molecular Diversity.

[12]  Sean Ekins Computational toxicology : risk assessment for pharmaceutical and environmental chemicals , 2007 .

[13]  Peter Greaves,et al.  First dose of potential new medicines to humans: how animals help , 2004, Nature Reviews Drug Discovery.

[14]  J. Kramer,et al.  The role of investigative molecular toxicology in early stage drug development , 2003, Expert opinion on drug safety.

[15]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. , 2001, Advanced drug delivery reviews.

[16]  M. Fielden,et al.  The role of early in vivo toxicity testing in drug discovery toxicology. , 2008, Expert opinion on drug safety.

[17]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[18]  Gordon M. Crippen,et al.  Data Mining the NCI60 to Predict Generalized Cytotoxicity , 2008, J. Chem. Inf. Model..

[19]  Peter-Jürgen Kramer,et al.  Replacement of in vivo acute oral toxicity studies by in vitro cytotoxicity methods: opportunities, limits and regulatory status. , 2008, Regulatory toxicology and pharmacology : RTP.

[20]  Meir Glick,et al.  Enrichment of High-Throughput Screening Data with Increasing Levels of Noise Using Support Vector Machines, Recursive Partitioning, and Laplacian-Modified Naive Bayesian Classifiers , 2006, J. Chem. Inf. Model..

[21]  Zsolt Lorincz,et al.  A neural network based classification scheme for cytotoxicity predictions:Validation on 30,000 compounds. , 2006, Bioorganic & medicinal chemistry letters.

[22]  Hongmao Sun,et al.  An Accurate and Interpretable Bayesian Classification Model for Prediction of hERG Liability , 2006, ChemMedChem.

[23]  Rick Harris,et al.  Does This Stuff Really Work , 2010 .

[24]  J. Hughes,et al.  Physiochemical drug properties associated with in vivo toxicological outcomes. , 2008, Bioorganic & medicinal chemistry letters.

[25]  Anthony E. Klon,et al.  Finding more needles in the haystack: A simple and efficient method for improving high-throughput docking results. , 2004, Journal of medicinal chemistry.

[26]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[27]  Ruili Huang,et al.  Compound Cytotoxicity Profiling Using Quantitative High-Throughput Screening , 2007, Environmental health perspectives.

[28]  George Papadatos,et al.  Evaluation of machine-learning methods for ligand-based virtual screening , 2007, J. Comput. Aided Mol. Des..

[29]  Andrew Bell,et al.  Searching Chemical Space with the Bayesian Idea Generator , 2009, J. Chem. Inf. Model..

[30]  Jürgen Bajorath,et al.  Bayesian Screening for Active Compounds in High‐dimensional Chemical Spaces Combining Property Descriptors and Molecular Fingerprints , 2007, Chemical biology & drug design.

[31]  R. W. Hansen,et al.  The price of innovation: new estimates of drug development costs. , 2003, Journal of health economics.

[32]  Jasmin R Gibson,et al.  An Improved β-Lactamase Reporter Assay: Multiplexing with a Cytotoxicity Readout for Enhanced Accuracy of Hit Identification , 2007, Journal of biomolecular screening.