Cheminformatic analysis of high-throughput compound screens.

This article gives an overview of basic computational methods that are commonly used for analyzing small molecule screening data in the chemical genomics field. First, we introduce cheminformatic concepts for analyzing drug-like small molecule structures and their properties. Second, we introduce compound selection approaches for assembling screening libraries using compound property and diversity analyses. Finally, we discuss methods for interpreting screening hits by analyzing compound structures and induced phenotypes using similarity search and clustering approaches. These are critical steps for optimizing screening hits, and relating structure to bioactivity and phenotype.

[1]  Alexander Tropsha,et al.  Chembench: a cheminformatics workbench , 2010, Bioinform..

[2]  J. Sutherland,et al.  A comparison of methods for modeling quantitative structure-activity relationships. , 2004, Journal of medicinal chemistry.

[3]  Martin Serrano,et al.  Nucleic Acids Research Advance Access published October 18, 2007 ChemBank: a small-molecule screening and , 2007 .

[4]  Gerhard Klebe,et al.  AffinDB: a freely accessible database of affinities for protein–ligand complexes from the PDB , 2005, Nucleic Acids Res..

[5]  Tao Jiang,et al.  ChemmineR: a compound mining framework for R , 2008, Bioinform..

[6]  Pierre Baldi,et al.  ChemDB update - full-text search and virtual chemical space , 2007, Bioinform..

[7]  Egon L. Willighagen,et al.  Bioclipse: an open source workbench for chemo- and bioinformatics , 2007, BMC Bioinformatics.

[8]  Robert P Sheridan,et al.  Why do we need so many chemical similarity search methods? , 2002, Drug discovery today.

[9]  Andreas Zell,et al.  Feature Selection for Descriptor Based Classification Models. 2. Human Intestinal Absorption (HIA) , 2004, J. Chem. Inf. Model..

[10]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[11]  Daniel R. Caffrey,et al.  Structure-based maximal affinity model predicts small-molecule druggability , 2007, Nature Biotechnology.

[12]  Nicolas Foloppe,et al.  Drug-like Annotation and Duplicate Analysis of a 23-Supplier Chemical Database Totalling 2.7 Million Compounds , 2004, J. Chem. Inf. Model..

[13]  Christian von Mering,et al.  STITCH: interaction networks of chemicals and proteins , 2007, Nucleic Acids Res..

[14]  Tao Jiang,et al.  A maximum common substructure-based algorithm for searching and predicting drug-like compounds , 2008, ISMB.

[15]  Alban Arrault,et al.  Managing, profiling and analyzing a library of 2.6 million compounds gathered from 32 chemical providers , 2006, Molecular Diversity.

[16]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[17]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[18]  Robert B. Russell,et al.  SuperTarget and Matador: resources for exploring drug-target relationships , 2007, Nucleic Acids Res..

[19]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[20]  Igor V. Tetko,et al.  Virtual Computational Chemistry Laboratory – Design and Description , 2005, J. Comput. Aided Mol. Des..

[21]  Wolfgang Huber,et al.  EBImage—an R package for image processing with applications to cellular phenotypes , 2010, Bioinform..

[22]  Peter D. Karp,et al.  MetaCyc and AraCyc. Metabolic Pathway Databases for Plant Research1[w] , 2005, Plant Physiology.

[23]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[24]  Xin Wang,et al.  Bioinformatics Applications Note Systems Biology Htsanalyzer: an R/bioconductor Package for Integrated Network Analysis of High-throughput Screens , 2022 .

[25]  Frank Oellien,et al.  Enhanced CACTVS Browser of the Open NCI Database , 2002, J. Chem. Inf. Comput. Sci..

[26]  Peter Gedeck,et al.  QSAR - How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets , 2006, J. Chem. Inf. Model..

[27]  S. Schreiber Chemical genetics resulting from a passion for synthetic organic chemistry. , 1998, Bioorganic & medicinal chemistry.

[28]  J. Baell,et al.  New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. , 2010, Journal of medicinal chemistry.

[29]  S. Bryant,et al.  PubChem as a public resource for drug discovery. , 2010, Drug discovery today.

[30]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[31]  Tudor I. Oprea,et al.  Chemical space navigation in lead discovery. , 2002, Current opinion in chemical biology.

[32]  Ovidiu Ivanciuc,et al.  Applications of Support Vector Machines in Chemistry , 2007 .

[33]  Mathias Dunkel,et al.  SuperDrug: a conformational drug database , 2005, Bioinform..

[34]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings , 1997 .

[35]  Qian Zhu,et al.  WENDI: A tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications , 2010, J. Cheminformatics.

[36]  Tudor I. Oprea,et al.  Strategies for compound selection. , 2004, Current drug discovery technologies.

[37]  T. Insel,et al.  NIH Molecular Libraries Initiative , 2004, Science.

[38]  Melissa R. Landon,et al.  JEDA: Joint entropy diversity analysis. An information-theoretic method for choosing diverse and representative subsets from combinatorial libraries , 2006, Molecular Diversity.

[39]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[40]  Yiqun Cao,et al.  ChemMine tools: an online service for analyzing and clustering small molecules , 2011, Nucleic Acids Res..

[41]  David E. Leahy,et al.  Chemical Descriptors Library (CDL): A Generic, Open Source Software Library for Chemical Informatics , 2008, J. Chem. Inf. Model..

[42]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[43]  Xin Chen,et al.  Performance of Similarity Measures in 2D Fragment-Based Similarity Searching: Comparison of Structural Descriptors and Similarity Coefficients , 2002, J. Chem. Inf. Comput. Sci..

[44]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[45]  Etienne Barnard,et al.  Data characteristics that determine classifier performance , 2006 .

[46]  R. Strausberg,et al.  From Knowing to Controlling: A Path from Genomics to Drugs Using Small Molecule Probes , 2003, Science.

[47]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[48]  Tao Jiang,et al.  Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing , 2010, Bioinform..

[49]  P. Hajduk,et al.  Statistical analysis of the effects of common chemical substituents on ligand potency. , 2008, Journal of medicinal chemistry.

[50]  Thomas Girke,et al.  ChemMine. A Compound Mining Database for Chemical Genomics1 , 2005, Plant Physiology.

[51]  C. Dobson Chemical space and biology , 2004, Nature.

[52]  M. Kanehisa,et al.  Heuristics for chemical compound matching. , 2003, Genome informatics. International Conference on Genome Informatics.

[53]  Juan J Perez,et al.  Managing molecular diversity. , 2005, Chemical Society reviews.

[54]  Tudor I. Oprea,et al.  Systems chemical biology. , 2007 .

[55]  Rajarshi Guha,et al.  Chemical Informatics Functionality in R , 2007 .

[56]  Egon L. Willighagen,et al.  The Blue Obelisk—Interoperability in Chemical Informatics , 2006, J. Chem. Inf. Model..

[57]  Nikolay P Savchuk,et al.  Exploring the chemogenomic knowledge space with annotated chemical libraries. , 2004, Current opinion in chemical biology.

[58]  H. Verheij,et al.  Leadlikeness and structural diversity of synthetic screening libraries , 2006, Molecular Diversity.

[59]  Stephen J Haggarty,et al.  The principle of complementarity: chemical versus biological space. , 2005, Current opinion in chemical biology.