ChemMine tools: an online service for analyzing and clustering small molecules

ChemMine Tools is an online service for small molecule data analysis. It provides a web interface to a set of cheminformatics and data mining tools that are useful for various analysis routines performed in chemical genomics and drug discovery. The service also offers programmable access options via the R library ChemmineR. The primary functionalities of ChemMine Tools fall into five major application areas: data visualization, structure comparisons, similarity searching, compound clustering and prediction of chemical properties. First, users can upload compound data sets to the online Compound Workbench. Numerous utilities are provided for compound viewing, structure drawing and format interconversion. Second, pairwise structural similarities among compounds can be quantified. Third, interfaces to ultra-fast structure similarity search algorithms are available to efficiently mine the chemical space in the public domain. These include fingerprint and embedding/indexing algorithms. Fourth, the service includes a Clustering Toolbox that integrates cheminformatic algorithms with data mining utilities to enable systematic structure and activity based analyses of custom compound sets. Fifth, physicochemical property descriptors of custom compound sets can be calculated. These descriptors are important for assessing the bioactivity profile of compounds in silico and quantitative structure—activity relationship (QSAR) analyses. ChemMine Tools is available at: http://chemmine.ucr.edu.

[1]  R. Strausberg,et al.  From Knowing to Controlling: A Path from Genomics to Drugs Using Small Molecule Probes , 2003, Science.

[2]  David E. Leahy,et al.  Chemical Descriptors Library (CDL): A Generic, Open Source Software Library for Chemical Informatics , 2008, J. Chem. Inf. Model..

[3]  Johannes H. Voigt,et al.  Comparison of the NCI Open Database with Seven Large Chemical Structural Databases , 2001, J. Chem. Inf. Comput. Sci..

[4]  M. Kanehisa,et al.  Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. , 2003, Journal of the American Chemical Society.

[5]  Tudor I. Oprea,et al.  Systems Chemical Biology , 2019, Methods in Molecular Biology.

[6]  Martin Serrano,et al.  Nucleic Acids Research Advance Access published October 18, 2007 ChemBank: a small-molecule screening and , 2007 .

[7]  Gerhard Klebe,et al.  AffinDB: a freely accessible database of affinities for protein–ligand complexes from the PDB , 2005, Nucleic Acids Res..

[8]  Andreas Zell,et al.  Feature Selection for Descriptor Based Classification Models. 2. Human Intestinal Absorption (HIA) , 2004, J. Chem. Inf. Model..

[9]  Mathias Dunkel,et al.  SuperDrug: a conformational drug database , 2005, Bioinform..

[10]  Rainer Schrader,et al.  Small Molecule Subgraph Detector (SMSD) toolkit , 2009, J. Cheminformatics.

[11]  Chris Morley,et al.  Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit , 2008, Chemistry Central journal.

[12]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[13]  Christian von Mering,et al.  STITCH: interaction networks of chemicals and proteins , 2007, Nucleic Acids Res..

[14]  Tao Jiang,et al.  A maximum common substructure-based algorithm for searching and predicting drug-like compounds , 2008, ISMB.

[15]  Alban Arrault,et al.  Managing, profiling and analyzing a library of 2.6 million compounds gathered from 32 chemical providers , 2006, Molecular Diversity.

[16]  Frank Oellien,et al.  Enhanced CACTVS Browser of the Open NCI Database , 2002, J. Chem. Inf. Comput. Sci..

[17]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[18]  Xin Chen,et al.  Performance of Similarity Measures in 2D Fragment-Based Similarity Searching: Comparison of Structural Descriptors and Similarity Coefficients , 2002, J. Chem. Inf. Comput. Sci..

[19]  Alexander Tropsha,et al.  Chembench: a cheminformatics workbench , 2010, Bioinform..

[20]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[21]  Pierre Baldi,et al.  ChemDB update - full-text search and virtual chemical space , 2007, Bioinform..

[22]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[23]  NIH Dives Into Drug Discovery , 2003, Science.

[24]  S. Bryant,et al.  PubChem as a public resource for drug discovery. , 2010, Drug discovery today.

[25]  Thomas Girke,et al.  ChemMine. A Compound Mining Database for Chemical Genomics1 , 2005, Plant Physiology.

[26]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[27]  C. Dobson Chemical space and biology , 2004, Nature.

[28]  M. Kanehisa,et al.  Heuristics for chemical compound matching. , 2003, Genome informatics. International Conference on Genome Informatics.

[29]  Stephen J Haggarty,et al.  The principle of complementarity: chemical versus biological space. , 2005, Current opinion in chemical biology.

[30]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[31]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. , 2001, Advanced drug delivery reviews.

[32]  Naomie Salim,et al.  Analysis and Display of the Size Dependence of Chemical Similarity Coefficients , 2003, J. Chem. Inf. Comput. Sci..

[33]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[34]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[35]  Igor V. Tetko,et al.  Virtual Computational Chemistry Laboratory – Design and Description , 2005, J. Comput. Aided Mol. Des..

[36]  Rajarshi Guha,et al.  Chemical Informatics Functionality in R , 2007 .

[37]  Egon L. Willighagen,et al.  The Blue Obelisk—Interoperability in Chemical Informatics , 2006, J. Chem. Inf. Model..

[38]  Qian Zhu,et al.  WENDI: A tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications , 2010, J. Cheminformatics.

[39]  Tudor I. Oprea,et al.  Strategies for compound selection. , 2004, Current drug discovery technologies.

[40]  T. Insel,et al.  NIH Molecular Libraries Initiative , 2004, Science.

[41]  Egon L. Willighagen,et al.  Towards interoperable and reproducible QSAR analyses: Exchange of datasets , 2010, J. Cheminformatics.

[42]  Peter Ertl,et al.  Molecular structure input on the web , 2010, J. Cheminformatics.

[43]  Tao Jiang,et al.  Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing , 2010, Bioinform..

[44]  Peter Willett,et al.  Maximum common subgraph isomorphism algorithms for the matching of chemical structures , 2002, J. Comput. Aided Mol. Des..

[45]  Tao Jiang,et al.  ChemmineR: a compound mining framework for R , 2008, Bioinform..