The pipeline system for Octave and Matlab (PSOM): a lightweight scripting framework and execution engine for scientific workflows

The analysis of neuroimaging databases typically involves a large number of inter-connected steps called a pipeline. The pipeline system for Octave and Matlab (PSOM) is a flexible framework for the implementation of pipelines in the form of Octave or Matlab scripts. PSOM does not introduce new language constructs to specify the steps and structure of the workflow. All steps of analysis are instead described by a regular Matlab data structure, documenting their associated command and options, as well as their input, output, and cleaned-up files. The PSOM execution engine provides a number of automated services: (1) it executes jobs in parallel on a local computing facility as long as the dependencies between jobs allow for it and sufficient resources are available; (2) it generates a comprehensive record of the pipeline stages and the history of execution, which is detailed enough to fully reproduce the analysis; (3) if an analysis is started multiple times, it executes only the parts of the pipeline that need to be reprocessed. PSOM is distributed under an open-source MIT license and can be used without restriction for academic or commercial projects. The package has no external dependencies besides Matlab or Octave, is straightforward to install and supports of variety of operating systems (Linux, Windows, Mac). We ran several benchmark experiments on a public database including 200 subjects, using a pipeline for the preprocessing of functional magnetic resonance images (fMRI). The benchmark results showed that PSOM is a powerful solution for the analysis of large databases using local or distributed computing resources.

[1]  David Manset,et al.  Virtual imaging laboratories for marker discovery in neurodegenerative diseases , 2011, Nature Reviews Neurology.

[2]  Alan C. Evans,et al.  The NIH MRI study of normal brain development , 2006, NeuroImage.

[3]  Daniel S. Margulies,et al.  Integration of a neuroimaging processing pipeline into a pan-canadian computing grid , 2012, HPC 2012.

[4]  D. Collins,et al.  Automatic 3D Intersubject Registration of MR Volumetric Data in Standardized Talairach Space , 1994, Journal of computer assisted tomography.

[5]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[6]  Alan C. Evans,et al.  Pipelines: Large Scale Automatic Analysis of 3D Brain Data Sets , 1998, NeuroImage.

[7]  Jorge Sepulcre,et al.  Evidence from intrinsic activity that asymmetry of the human brain is controlled by multiple factors , 2009, Proceedings of the National Academy of Sciences.

[8]  Richard M. Leahy,et al.  Brainstorm: A User-Friendly Application for MEG/EEG Analysis , 2011, Comput. Intell. Neurosci..

[9]  O. Witte,et al.  Functional Mapping of the Human Brain , 2000 .

[10]  Bellec Pierre Stable clusters of brain regions associated with distinct motor task-evoked hemodynamic responses , 2011 .

[11]  Daniel S. Marcus,et al.  The extensible neuroimaging archive toolkit , 2007, Neuroinformatics.

[12]  Marc Poinot,et al.  Five Good Reasons to Use the Hierarchical Data Format , 2010, Computing in Science & Engineering.

[13]  Daniel S. Katz,et al.  Swift: A language for distributed parallel scripting , 2011, Parallel Comput..

[14]  Chaogan Yan,et al.  DPARSF: A MATLAB Toolbox for “Pipeline” Data Analysis of Resting-State fMRI , 2010, Front. Syst. Neurosci..

[15]  Amir Shmuel,et al.  Global and System-Specific Resting-State fMRI Fluctuations Are Uncorrelated: Principal Component Analysis Reveals Anti-Correlated Networks , 2011, Brain Connect..

[16]  Arthur W. Toga,et al.  Provenance in neuroimaging , 2008, NeuroImage.

[17]  Alan C. Evans,et al.  A General Statistical Analysis for fMRI Data , 2000, NeuroImage.

[18]  Alan C. Evans,et al.  Negative Associations between Corpus Callosum Midsagittal Area and IQ in a Representative Sample of Healthy Children and Adolescents , 2011, PloS one.

[19]  Jill P Mesirov,et al.  Accessible Reproducible Research , 2010, Science.

[20]  Alan C. Evans,et al.  Bootstrap generation and evaluation of an fMRI simulation database. , 2009, Magnetic resonance imaging.

[21]  Habib Benali,et al.  CORSICA: correction of structured noise in fMRI by automatic identification of ICA components. , 2007, Magnetic resonance imaging.

[22]  Timothy G. Armstrong INTEGRATING TASK PARALLELISM INTO THE PYTHON PROGRAMMING LANGUAGE , 2011 .

[23]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[24]  Luc Moreau,et al.  Report on the International Provenance and Annotation Workshop: (IPAW'06) 3-5 May 2006, Chicago , 2006, SGMD.

[25]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[26]  Satrajit S. Ghosh,et al.  Nipype: A Flexible, Lightweight and Extensible Neuroimaging Data Processing Framework in Python , 2011, Front. Neuroinform..

[27]  Adrian Burton,et al.  Big science for a big problem: ADNI enters its second phase , 2011, The Lancet Neurology.

[28]  D. Louis Collins,et al.  Unbiased average age-appropriate atlases for pediatric studies , 2011, NeuroImage.

[29]  Alan C. Evans,et al.  LORIS: a web-based data management system for multi-center studies , 2012, Front. Neuroinform..

[30]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[31]  Bertram Ludäscher,et al.  Scientific workflow management and the Kepler system: Research Articles , 2006 .

[32]  John Ashburner,et al.  SPM: A history , 2012, NeuroImage.

[33]  C. Grova,et al.  Detection of hemodynamic responses to epileptic activity using simultaneous Electro-EncephaloGraphy (EEG)/Near Infra Red Spectroscopy (NIRS) acquisitions , 2011, NeuroImage.

[34]  FosterIan,et al.  Report on the International Provenance and Annotation Workshop , 2006 .

[35]  Carl Hewitt,et al.  The incremental garbage collection of processes , 1977, Artificial Intelligence and Programming Languages.

[36]  Arthur W. Toga,et al.  Effi cient , distributed and interactive neuroimaging data analysis using the LONI Pipeline , 2009 .

[37]  Jean Gotman,et al.  Functional connectivity in patients with idiopathic generalized epilepsy , 2011, Epilepsia.

[38]  Ian T. Foster,et al.  Accelerating Medical Research using the Swift Workflow System , 2007, HealthGrid.

[39]  Amir Shmuel,et al.  Hippocampal resting-state connectivity lateralize memory function in aMCI brain networks , 2011, Alzheimer's & Dementia.

[40]  Alan C. Evans,et al.  Multi-level bootstrap analysis of stable clusters in resting-state fMRI , 2009, NeuroImage.

[41]  Christian Windischberger,et al.  Toward discovery science of human brain function , 2010, Proceedings of the National Academy of Sciences.