Sesame: A new bioinformatics semantic workflow design system

Biologists have become increasingly dependent on bioinformatics tools to analyze and interpret their datasets. The number, variety and complexity of these bioinformatics tools have increased dramatically and they have become more and more computationally complex, expensive and resource intensive. Powerful workflow design systems have been developed to automate the execution of set of tools for a specific task. However, designing a complex executable workflow using such tools still requires considerable computational expertise or the help from a bioinformatics expert. In this paper, we developed Sesame, a bioinformatics semantic workflow design system. We have designed a new ontology for bioinformatics tools and services (OBTS) and proposed an ontology driven semantic workflow design mechanism, using this new OBTS. Compared to an executable workflow, the semantic workflow is at the level of biological concepts that are closer to the scientific research. Biologists will greatly benefit from the decoupling of semantic workflow design from executable workflow design with computational implementation details. Currently, a prototype version of Sesame system has been implemented and deployed. Sesame will allow biologists to efficiently perform complex data analysis to address scientific questions.

[1]  Yolanda Gil Workflow Composition: Semantic Representations for Flexible Automation , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[2]  Edoardo Pignotti,et al.  Enhancing workflow with a semantic description of scientific intent , 2011, J. Web Semant..

[3]  Kazutaka Katoh,et al.  Multiple alignment of DNA sequences with MAFFT. , 2009, Methods in molecular biology.

[4]  Jian-Qun Chen,et al.  Genome-wide identification of NBS genes in japonica rice reveals significant expansion of divergent non-TIR NBS-LRR genes , 2004, Molecular Genetics and Genomics.

[5]  Alexander D. Diehl,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm029 Databases and ontologies Ontology development for biological systems: immunology , 2006 .

[6]  T. Flutre,et al.  Considering Transposable Element Diversification in De Novo Annotation Approaches , 2011, PloS one.

[7]  Jianting Zhang,et al.  Automatic Transformation from Geospatial Conceptual Workflow to Executable Workflow Using GRASS GIS Command Line Modules in Kepler , 2006, International Conference on Computational Science.

[8]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[9]  Henrik Eriksson,et al.  The evolution of Protégé: an environment for knowledge-based systems development , 2003, Int. J. Hum. Comput. Stud..

[10]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[11]  Kyujung Van,et al.  Genome-wide mapping of NBS-LRR genes and their association with disease resistance in soybean , 2012, BMC Plant Biology.

[12]  Mark D. Wilkinson,et al.  Semantically-Guided Workflow Construction in Taverna: The SADI and BioMoby Plug-Ins , 2010, ISoLA.

[13]  Cláudio T. Silva,et al.  Managing Rapidly-Evolving Scientific Workflows , 2006, IPAW.

[14]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[15]  Jing Tao,et al.  Incorporating Semantics in Scientific Workflow Authoring , 2005, SSDBM.

[16]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.