论文信息 - ChemmineR: a compound mining framework for R

ChemmineR: a compound mining framework for R

Motivation: Software applications for structural similarity searching and clustering of small molecules play an important role in drug discovery and chemical genomics. Here, we present the first open-source compound mining framework for the popularstatistical programming environment R. The integration with a powerful statistical environment maximizes the flexibility, expandability and programmability of the provided analysis functions. Results: We discuss the algorithms and compound mining utilities provided by the R package ChemmineR. It contains functions for structural similarity searching, clustering of compound libraries with a wide spectrum of classification algorithms and various utilities for managing complex compound data. It also offers a wide range of visualization functions for compound clusters and chemical structures. The package is well integrated with the online ChemMine environment and allows bidirectional communications between the two services. Availability: ChemmineR is freely available as an R package from the ChemMine project site: http://bioweb.ucr.edu/ChemMineV2/chemminer Contact: thomas.girke@ucr.edu

[1] Martin Serrano,et al. Nucleic Acids Research Advance Access published October 18, 2007 ChemBank: a small-molecule screening and , 2007 .

[2] Naomie Salim,et al. Analysis and Display of the Size Dependence of Chemical Similarity Coefficients , 2003, J. Chem. Inf. Comput. Sci..

[3] Peter Gedeck,et al. QSAR - How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets , 2006, J. Chem. Inf. Model..

[4] R. Venkataraghavan,et al. Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[5] Egon L. Willighagen,et al. The Blue Obelisk—Interoperability in Chemical Informatics , 2006, J. Chem. Inf. Model..

[6] Shweta Shah,et al. The effect of ultrasonic pre-treatment on the catalytic activity of lipases in aqueous and non-aqueous media , 2008, Chemistry Central journal.

[7] Xin Chen,et al. Performance of Similarity Measures in 2D Fragment-Based Similarity Searching: Comparison of Structural Descriptors and Similarity Coefficients , 2002, J. Chem. Inf. Comput. Sci..

[8] Pierre Baldi,et al. ChemDB: a public database of small molecules and related chemoinformatics resources , 2005, Bioinform..

[9] Jörg Rahnenführer,et al. Robert Gentleman, Vincent Carey, Wolfgang Huber, Rafael Irizarry, Sandrine Dudoit (2005): Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2009 .

[10] Chris Morley,et al. Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit , 2008, Chemistry Central journal.

[11] Brian K. Shoichet,et al. ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[12] Thomas Girke,et al. ChemMine. A Compound Mining Database for Chemical Genomics1 , 2005, Plant Physiology.

[13] Peter Willett,et al. Heuristics for Similarity Searching of Chemical Graphs Using a Maximum Common Edge Subgraph Algorithm , 2002, J. Chem. Inf. Comput. Sci..