Provenance Management for Neuroimaging Workflows in neuGrid

An increased amount of large scale, collaborative biomedical research has recently been conducted on e-Science infrastructures. Such research typically involves conducting comparative analysis on large amounts of data to search for biomarkers for diseases. Running these analysis manually can often be quite cumbersome, labour-intensive and error-prone. Significant work has been invested into automating such analysis with appropriately configured workflows. It is also important for biomedical researchers to validate analysis outcomes, to ensure the reproducibility of the results and to ascertain the ownership of specific scientific results. The detailed, traceable information required for this is often referred to as provenance data. Developing suitable methods and approaches to managing provenance data in large-scale distributed e-Science environments is another important area of research currently being investigated. We present an approach that has been adopted in the neu GRID project, which aims to develop an infrastructure to facilitate research into neurodegenerative disease studies such as Alzheimer's. To facilitate the automation of complex, large-scale analysis in neu GRID, we have adapted CRISTAL, a workflow and provenance tracking solution. The use of CRISTAL has provided a rich environment for neuroscientists to track and manage the evolution of both data and workflows in the neu GRID infrastructure.

[1]  Sam Joseph,et al.  NeuroGrid: Semantically Routing Queries in Peer-to-Peer Networks , 2002, NETWORKING Workshops.

[2]  Emmanouel A. Varvarigos,et al.  Developing Scheduling Policies in gLite Middleware , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[3]  Johan Montagnat,et al.  NeuroLOG: a community-driven middleware design , 2008, HealthGrid.

[4]  Enrico Gregori,et al.  Web Engineering and Peer-to-Peer Computing , 2002, Lecture Notes in Computer Science.

[5]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[6]  G. Frisoni,et al.  Structural imaging in the clinical diagnosis of Alzheimer's disease: problems and tools , 2001, Journal of neurology, neurosurgery, and psychiatry.

[7]  R. McClatchey,et al.  A distributed workflow and product data management application for the construction of large scale scientific apparatus , 1998 .

[8]  David De Roure,et al.  Experiences with GRIA - Industrial Applications on a Web Services Grid , 2005, e-Science.

[9]  Carole A. Goble,et al.  Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements , 2008, IPAW.

[10]  Mirek Riedewald,et al.  Provenance in High-Energy Physics Workflows , 2008, Computing in Science & Engineering.

[11]  Paul T. Groth,et al.  Connecting Scientific Data to Scientific Experiments with Provenance , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[12]  B. Achiriloaie,et al.  VI REFERENCES , 1961 .

[13]  Paul T. Groth,et al.  Provenance: The Bridge Between Experiments and Data , 2008, Computing in Science & Engineering.

[14]  Alan C. Evans,et al.  Automatic "pipeline" analysis of 3-D MRI data for clinical trials: application to multiple sclerosis , 2002, IEEE Transactions on Medical Imaging.

[15]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[16]  D. L. Collins,et al.  The MINC file format: from bytes to brains , 1998, NeuroImage.