Bio-Swarm-Pipeline: A Light-Weight, Extensible Batch Processing System for Efficient Biomedical Data Processing

A streamlined scientific workflow system that can track the details of the data processing history is critical for the efficient handling of fundamental routines used in scientific research. In the scientific workflow research community, the information that describes the details of data processing history is referred to as “provenance” which plays an important role in most of the existing workflow management systems. Despite its importance, however, provenance modeling and management is still a relatively new area in the scientific workflow research community. The proper scope, representation, granularity and implementation of a provenance model can vary from domain to domain and pose a number of challenges for an efficient pipeline design. This paper provides a case study on structured provenance modeling and management problems in the neuroimaging domain by introducing the Bio-Swarm-Pipeline. This new model, which is evaluated in the paper through real world scenarios, systematically addresses the provenance scope, representation, granularity, and implementation issues related to the neuroimaging domain. Although this model stems from applications in neuroimaging, the system can potentially be adapted to a wide range of bio-medical application scenarios.

[1]  Juliana Freire,et al.  VisComplete: Automating Suggestions for Visualization Pipelines , 2008, IEEE Transactions on Visualization and Computer Graphics.

[2]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[3]  Gregor von Laszewski,et al.  Swift: Fast, Reliable, Loosely Coupled Parallel Computation , 2007, 2007 IEEE Congress on Services (Services 2007).

[4]  R W Cox,et al.  AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. , 1996, Computers and biomedical research, an international journal.

[5]  Arthur W. Toga,et al.  Neuroimaging Data Provenance Using the LONI Pipeline Workflow Environment , 2008, IPAW.

[6]  Timothy R. Olsen,et al.  The Extensible Neuroimaging Archive Toolkit: an informatics platform for managing, exploring, and sharing neuroimaging data. , 2007, Neuroinformatics.

[7]  Karl J. Friston,et al.  Voxel-Based Morphometry—The Methods , 2000, NeuroImage.

[8]  Paul T. Groth,et al.  Provenance: The Bridge Between Experiments and Data , 2008, Computing in Science & Engineering.

[9]  Karl J. Friston,et al.  Statistical parametric maps in functional imaging: A general linear approach , 1994 .

[10]  David B. Keator,et al.  A National Human Neuroimaging Collaboratory Enabled by the Biomedical Informatics Research Network (BIRN) , 2008, IEEE Transactions on Information Technology in Biomedicine.

[11]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[12]  Cláudio T. Silva,et al.  Provenance for Visualizations: Reproducibility and Beyond , 2007, Computing in Science & Engineering.

[13]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[14]  Jing Hua,et al.  A Reference Architecture for Scientific Workflow Management Systems and the VIEW SOA Solution , 2009, IEEE Transactions on Services Computing.

[15]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[16]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[17]  Anders M. Dale,et al.  Cortical Surface-Based Analysis I. Segmentation and Surface Reconstruction , 1999, NeuroImage.