The opportunities of mining historical and collective data in drug discovery.

Vast amounts of bioactivity data have been generated for small molecules across public and corporate domains. Biological signatures, either derived from systematic profiling efforts or from existing historical assay data, have been successfully employed for small molecule mechanism-of-action elucidation, drug repositioning, hit expansion and screening subset design. This article reviews different types of biological descriptors and applications, and we demonstrate how biological data can outlive the original purpose or project for which it was generated. By comparing 150 HTS campaigns run at Novartis over the past decade on the basis of their active and inactive chemical matter, we highlight the opportunities and challenges associated with cross-project learning in drug discovery.

[1]  L. Peshkin,et al.  Exploiting polypharmacology for drug target deconvolution , 2014, Proceedings of the National Academy of Sciences.

[2]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[3]  Anne Mai Wassermann,et al.  Composition and applications of focus libraries to phenotypic assays , 2014, Front. Pharmacol..

[4]  T. Golub,et al.  Gene expression signature-based chemical genomic prediction identifies a novel class of HSP90 pathway modulators. , 2006, Cancer cell.

[5]  G. S. Johnson,et al.  An Information-Intensive Approach to the Molecular Pharmacology of Cancer , 1997, Science.

[6]  Liping Wei,et al.  Chemical genomic screening reveals synergism between parthenolide and inhibitors of the PI-3 kinase and mTOR pathways. , 2010, Blood.

[7]  Ross McGuire,et al.  Data-driven medicinal chemistry in the era of big data. , 2014, Drug discovery today.

[8]  S. Carr,et al.  A pipeline that integrates the discovery and verification of plasma protein biomarkers reveals candidate markers for cardiovascular disease , 2011, Nature Biotechnology.

[9]  Ruedi Aebersold,et al.  Reproducible Quantification of Cancer-Associated Proteins in Body Fluids Using Targeted Proteomics , 2012, Science Translational Medicine.

[10]  D. Lauffenburger,et al.  Multiple reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks , 2007, Proceedings of the National Academy of Sciences.

[11]  M. Perretti,et al.  Gene expression signature-based approach identifies a pro-resolving mechanism of action for histone deacetylase inhibitors , 2012, Cell Death and Differentiation.

[12]  Monica L Guzman,et al.  Discovery of agents that eradicate leukemia stem cells using an in silico screen of public gene expression data. , 2008, Blood.

[13]  Michael B. Black,et al.  A comprehensive statistical analysis of predicting in vivo hazard using high-throughput in vitro screening. , 2012, Toxicological sciences : an official journal of the Society of Toxicology.

[14]  Lani F. Wu,et al.  Multidimensional Drug Profiling By Automated Microscopy , 2004, Science.

[15]  J N Weinstein,et al.  Neural computing in cancer drug development: predicting mechanism of action. , 1992, Science.

[16]  Tudor I. Oprea,et al.  An Overview of the Challenges in Designing, Integrating, and Delivering BARD , 2014, Journal of biomolecular screening.

[17]  Ubbo Visser,et al.  BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results , 2011, BMC Bioinformatics.

[18]  Rong Chen,et al.  Methodologies for Extracting Functional Pharmacogenomic Experiments from International Repository , 2007, AMIA.

[19]  J. Kishimoto,et al.  Identification of novel hair‐growth inducers by means of connectivity mapping , 2010, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[20]  D. Reinberg,et al.  Transcription regulation by histone methylation: interplay between different covalent modifications of the core histone tails. , 2001, Genes & development.

[21]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[22]  Cynthia L Adams,et al.  Compound classification using image-based cellular phenotypes. , 2006, Methods in enzymology.

[23]  F. Collins,et al.  Policy: NIH plans to enhance reproducibility , 2014, Nature.

[24]  Richard S. Judson,et al.  Profiling Bioactivity of the ToxCast Chemical Library Using BioMAP Primary Human Cell Systems , 2009, Journal of biomolecular screening.

[25]  T. Insel,et al.  NIH Molecular Libraries Initiative , 2004, Science.

[26]  J N Weinstein,et al.  Identification of epidermal growth factor receptor and c-erbB2 pathway inhibitors by correlation with gene expression patterns. , 1997, Journal of the National Cancer Institute.

[27]  Naveen Kumar,et al.  Receptor Tyrosine Kinase Inhibitors That Block Replication of Influenza A and Other Viruses , 2011, Antimicrobial Agents and Chemotherapy.

[28]  Ubbo Visser,et al.  Formalization, Annotation and Analysis of Diverse Drug and Probe Screening Assay Datasets Using the BioAssay Ontology (BAO) , 2012, PloS one.

[29]  Deepak K Rajpal,et al.  Applications of Connectivity Map in drug discovery and development. , 2012, Drug discovery today.

[30]  P. Bork,et al.  Drug Target Identification Using Side-Effect Similarity , 2008, Science.

[31]  Harald Labischinski,et al.  Proteomic Approach to Understanding Antibiotic Action , 2003, Antimicrobial Agents and Chemotherapy.

[32]  Scott Boyer,et al.  Exploiting Pharmacological Similarity to Identify Safety Concerns – Listen to What the Data Tells You , 2013, Molecular informatics.

[33]  Marc Hafner,et al.  Profiles of Basal and Stimulated Receptor Signaling Networks Predict Drug Response in Breast Cancer Lines , 2013, Science Signaling.

[34]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[35]  Valery R. Polyakov,et al.  Enrichment Analysis for Discovering Biological Associations in Phenotypic Screens , 2014, J. Chem. Inf. Model..

[36]  Hiroaki Kitano,et al.  Integrative knowledge management to enhance pharmaceutical R&D , 2014, Nature Reviews Drug Discovery.

[37]  Hiroshi Yamada,et al.  Gene expression profiling in rat liver treated with compounds inducing elevation of bilirubin , 2009, Human & experimental toxicology.

[38]  Hans Bitter,et al.  Identification of a Kinase Profile that Predicts Chromosome Damage Induced by Small Molecule Kinase Inhibitors , 2009, PLoS Comput. Biol..

[39]  D. Zaharevitz,et al.  COMPARE: a web accessible tool for investigating mechanisms of cell growth inhibition. , 2002, Journal of molecular graphics & modelling.

[40]  A. Fliri,et al.  Biological spectra analysis: Linking biological activity profiles to molecular structure. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Peter S. Kutchukian,et al.  Rethinking molecular similarity: comparing compounds on the basis of biological activity. , 2012, ACS chemical biology.

[42]  M. Salto‐Tellez,et al.  Connectivity Mapping for Candidate Therapeutics Identification Using Next Generation Sequencing RNA-Seq Data , 2013, PloS one.

[43]  Kenneth M Comess,et al.  Development of a High-Content Screening Assay Panel to Accelerate Mechanism of Action Studies for Oncology Research , 2012, Journal of biomolecular screening.

[44]  Forest M White,et al.  Quantitative phosphoproteomics by mass spectrometry: Past, present, and future , 2008, Proteomics.

[45]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[46]  David M. Rocke,et al.  Predicting ligand binding to proteins by affinity fingerprinting. , 1995, Chemistry & biology.

[47]  Anne Mai Wassermann,et al.  Efficient search of chemical space: navigating from fragments to structurally diverse chemotypes. , 2013, Journal of medicinal chemistry.

[48]  John A. Tallarico,et al.  Integrating high-content screening and ligand-target prediction to identify mechanism of action. , 2008, Nature chemical biology.

[49]  E. Berg,et al.  An integrative biology approach for analysis of drug action in models of human vascular inflammation , 2004, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[50]  Eric Reiter,et al.  GRKs and β-arrestins: roles in receptor silencing, trafficking and signaling , 2006, Trends in Endocrinology & Metabolism.

[51]  Atsushi Ono,et al.  Gene expression profiling in rat liver treated with compounds inducing phospholipidosis. , 2008, Toxicology and applied pharmacology.

[52]  Sean Ekins,et al.  Chemical target and pathway toxicity mechanisms defined in primary human cell systems. , 2010, Journal of pharmacological and toxicological methods.

[53]  Egon L. Willighagen,et al.  Linked open drug data for pharmaceutical research and development , 2011, J. Cheminformatics.

[54]  M. Monga,et al.  Developmental Therapeutics Program at the NCI: molecular target and drug discovery process , 2002, Leukemia.

[55]  David Cavalla,et al.  Predictive methods in drug repurposing: gold mine or just a bigger haystack? , 2013, Drug discovery today.

[56]  Yanli Wang,et al.  Identifying Compound-Target Associations by Combining Bioactivity Profile Similarity Search and Public Databases Mining , 2011, J. Chem. Inf. Model..

[57]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[58]  David M. Reif,et al.  Profiling 976 ToxCast Chemicals across 331 Enzymatic and Receptor Signaling Assays , 2013, Chemical research in toxicology.

[59]  Barbara Zdrazil,et al.  Scientific competency questions as the basis for semantically enriched open pharmacological space development. , 2013, Drug discovery today.

[60]  Helen E. Parkinson,et al.  ArrayExpress—a public database of microarray experiments and gene expression profiles , 2006, Nucleic Acids Res..

[61]  Monica Campillos,et al.  Unveiling new biological relationships using shared hits of chemical screening assay pairs , 2014, Bioinform..

[62]  D. Dix,et al.  The ToxCast program for prioritizing toxicity testing of environmental chemicals. , 2007, Toxicological sciences : an official journal of the Society of Toxicology.

[63]  Barend Mons,et al.  Open PHACTS: semantic interoperability for drug discovery. , 2012, Drug discovery today.

[64]  Anne Mai Wassermann,et al.  Bioturbo Similarity Searching: Combining Chemical and Biological Similarity To Discover Structurally Diverse Bioactive Molecules , 2013, J. Chem. Inf. Model..

[65]  H. Yamada,et al.  The Japanese toxicogenomics project: application of toxicogenomics. , 2010, Molecular nutrition & food research.

[66]  D. Swinney,et al.  How were new medicines discovered? , 2011, Nature Reviews Drug Discovery.

[67]  Douglas W. Selinger,et al.  Pathway Reporter Assays Reveal Small Molecule Mechanisms of Action , 2009 .

[68]  David Botstein,et al.  The Stanford Microarray Database , 2001, Nucleic Acids Res..

[69]  Wei Zheng,et al.  Phenotypic screens as a renewed approach for drug discovery. , 2013, Drug discovery today.

[70]  J. Bajorath,et al.  Learning from 'big data': compounds and targets. , 2014, Drug discovery today.

[71]  Alexander A. Morgan,et al.  Discovery and Preclinical Validation of Drug Indications Using Compendia of Public Gene Expression Data , 2011, Science Translational Medicine.

[72]  Martin Serrano,et al.  Nucleic Acids Research Advance Access published October 18, 2007 ChemBank: a small-molecule screening and , 2007 .

[73]  David M. Reif,et al.  Update on EPA's ToxCast program: providing high throughput decision support tools for chemical risk management. , 2012, Chemical research in toxicology.

[74]  A. Fliri,et al.  Biospectra analysis: model proteome characterizations for linking molecular structure and biological response. , 2005, Journal of medicinal chemistry.

[75]  A. Fliri,et al.  Analysis of drug-induced effect patterns to link structure and side effects of medicines , 2005, Nature chemical biology.

[76]  Martin Peifer,et al.  Tumor VEGF:VEGFR2 autocrine feed-forward loop triggers angiogenesis in lung cancer , 2013 .

[77]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[78]  Bin Chen,et al.  Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data , 2010, BMC Bioinformatics.

[79]  Dragos Horvath,et al.  Predicting ADME properties and side effects: the BioPrint approach. , 2003, Current opinion in drug discovery & development.

[80]  Xian Zhang,et al.  Benchmarking of Multivariate Similarity Measures for High-Content Screening Fingerprints in Phenotypic Drug Discovery , 2013, Journal of biomolecular screening.

[81]  Anders Wallqvist,et al.  Classification of scaffold-hopping approaches. , 2012, Drug discovery today.

[82]  Thierry Kogej,et al.  Making every SAR point count: the development of Chemistry Connect for the large-scale integration of structure and bioactivity data. , 2011, Drug discovery today.

[83]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[84]  Stephan C Schürer,et al.  Chemical interrogation of the neuronal kinome using a primary cell-based screening assay. , 2013, ACS chemical biology.

[85]  E. Berg,et al.  Building Predictive Models for Mechanism-of-Action Classification from Phenotypic Assay Data Sets , 2013, Journal of biomolecular screening.

[86]  Ni Li,et al.  Gene Ontology Annotations and Resources , 2012, Nucleic Acids Res..

[87]  D A Scudiero,et al.  Display and analysis of patterns of differential activity of drugs against human tumor cell lines: development of mean graph and COMPARE algorithm. , 1989, Journal of the National Cancer Institute.

[88]  Peter G. Schultz,et al.  In silico activity profiling reveals the mechanism of action of antimalarials discovered in a high-throughput screen , 2008, Proceedings of the National Academy of Sciences.

[89]  Michael J. Keiser,et al.  The Chemical Basis of Pharmacology , 2010, Biochemistry.

[90]  Justin Lamb,et al.  The Connectivity Map: a new tool for biomedical research , 2007, Nature Reviews Cancer.

[91]  Joshua M. Stuart,et al.  "Function-first" lead discovery: mode of action profiling of natural product libraries using image-based screening. , 2013, Chemistry & biology.

[92]  K D Paull,et al.  Identification of novel antimitotic agents acting at the tubulin level by computer-assisted evaluation of differential cytotoxicity data. , 1992, Cancer research.

[93]  Anne Mai Wassermann,et al.  A screening pattern recognition method finds new and divergent targets for drugs and natural products. , 2014, ACS chemical biology.

[94]  Yanli Wang,et al.  PubChem BioAssay: 2014 update , 2013, Nucleic Acids Res..

[95]  Nigel Greene,et al.  Analysis of Pfizer compounds in EPA's ToxCast chemicals-assay space. , 2014, Chemical research in toxicology.

[96]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[97]  R. Shoemaker The NCI60 human tumour cell line anticancer drug screen , 2006, Nature Reviews Cancer.