Scientific workflow systems: Pipeline Pilot and KNIME

There are many examples of scientific workflow systems [1, 2]; in this short article I will concentrate only on cheminformatics applications and the workflow tools most commonly used in cheminformatics, namely Pipeline Pilot [3] and KNIME [4]. Workflow solutions have been used for years in bioinformatics and other sciences, and some also have applications in so-called “business intelligence” and “predictive analytics”. Readers can find details of Discovery Net, Galaxy, Kepler, Triana, SOMA, SMILA, VisTrails, and others on the Web. Kappler has compared Competitive Workflow, Taverna and Pipeline Pilot [5]. Taverna has been widely used in bioinformatics but is also used with the Chemistry Development Kit (CDK) [6, 7]. CDK-Taverna workflows are made freely available at myExperiment.org [8]. (myExperiment.org also includes KNIME workflows.) DiscoveryNet was one of the earliest examples of a scientific workflow system; its concepts were later commercialized in InforSense Knowledge Discovery Environment (KDE). My 2007 review [1] centered on Pipeline Pilot and InforSense KDE; KNIME was then a relative newcomer. In 2009 the loss-making InforSense organization was acquired by IDBS and KDE has made progress in translational medicine [9]. InforSense’s ChemSense [10] used ChemAxon’s JChem Cartridge, and ChemAxon chemical structure, property prediction, and enumeration tools. ChemSense’s three major pharmaceutical customers have turned to other solutions. The InforSense Suite lives on but it not seen as a “personal productivity tool”; rather it is integrated into the IDBS ELN platform. KNIME and Pipeline Pilot are now the market leaders in personal productivity in cheminformatics.