Reliable Performance for Streaming Analysis Workflows

Workflow systems are in wide use in the scientific community today, facilitating complex computational and analytical processes. Their increasing popularity is particularly visible at workflow sharing sites such as MyExperiment [1] or Galaxy [2-4]. High-performance computing (HPC) users also are looking toward workflow solutions to manage their complex preand post-processing needs. This trend likely will continue with the advent of exascale architectures, which will require extreme-scale collaborations between applications running on an exascale system and community data and knowledge repositories needed for their validation and steering [5, 6]. A new emerging use for workflows is the in-situ / streaming, often adaptive analysis of large scale simulation runs and as well as the need to analyze and interpret experimental results [10], in both cases to steer the scientific work and optimize the scientific outcome. In particular in this last case the reliably performance of the workflow is absolutely key to its usefulness.