KNIME-CDK: Workflow-driven cheminformatics

BackgroundCheminformaticians have to routinely process and analyse libraries of small molecules. Among other things, that includes the standardization of molecules, calculation of various descriptors, visualisation of molecular structures, and downstream analysis. For this purpose, scientific workflow platforms such as the Konstanz Information Miner can be used if provided with the right plug-in. A workflow-based cheminformatics tool provides the advantage of ease-of-use and interoperability between complementary cheminformatics packages within the same framework, hence facilitating the analysis process.ResultsKNIME-CDK comprises functions for molecule conversion to/from common formats, generation of signatures, fingerprints, and molecular properties. It is based on the Chemistry Development Toolkit and uses the Chemical Markup Language for persistence. A comparison with the cheminformatics plug-in RDKit shows that KNIME-CDK supports a similar range of chemical classes and adds new functionality to the framework. We describe the design and integration of the plug-in, and demonstrate the usage of the nodes on ChEBI, a library of small molecules of biological interest.ConclusionsKNIME-CDK is an open-source plug-in for the Konstanz Information Miner, a free workflow platform. KNIME-CDK is build on top of the open-source Chemistry Development Toolkit and allows for efficient cross-vendor structural cheminformatics. Its ease-of-use and modularity enables researchers to automate routine tasks and data analysis, bringing complimentary cheminformatics functionality to the workflow environment.

[1]  Dorit Merhof,et al.  HiTSEE KNIME: a visualization tool for hit selection and analysis in high-throughput screening experiments for the KNIME platform , 2012, BMC Bioinformatics.

[2]  Carole A. Goble,et al.  myExperiment: a repository and social network for the sharing of bioinformatics workflows , 2010, Nucleic Acids Res..

[3]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[4]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[5]  Christoph Steinbeck,et al.  JChemPaint - Using the collaborative forces of the Internet to develop a free editor for 2D chemical structures , 2000 .

[6]  Pierre Lindenbaum,et al.  Knime4Bio: a set of custom nodes for the interpretation of next-generation sequencing data with KNIME† , 2011, Bioinform..

[7]  Wendy A. Warr,et al.  Scientific workflow systems: Pipeline Pilot and KNIME , 2012, Journal of Computer-Aided Molecular Design.

[8]  Wendy A. Warr,et al.  Representation of chemical structures , 2011 .

[9]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[10]  Bernd Wiswedel,et al.  Extending KNIME for next-generation sequencing data analysis , 2011, Bioinform..

[11]  Maurizio Vichi,et al.  Studies in Classification Data Analysis and knowledge Organization , 2011 .

[12]  Peter Murray-Rust,et al.  Chemical Name to Structure: OPSIN, an Open Source Solution , 2011, J. Chem. Inf. Model..

[13]  Maíra R. Rodrigues,et al.  A graph-based approach for designing extensible pipelines , 2012, BMC Bioinformatics.

[14]  Vincent Le Guilloux,et al.  Visual Characterization and Diversity Quantification of Chemical Libraries: 1. Creation of Delimited Reference Chemical Subspaces , 2011, J. Chem. Inf. Model..

[15]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[16]  Henry S. Rzepa,et al.  Chemical Markup, XML, and the World Wide Web. 4. CML Schema , 2003, J. Chem. Inf. Comput. Sci..

[17]  Egon L. Willighagen,et al.  Chemical Markup, XML, and the World Wide Web, 7. CMLSpect, an XML Vocabulary for Spectral Data , 2007, J. Chem. Inf. Model..