Abstract Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. This paper presents the Metadata, Provenance, and Ontology (MPO) System, the software that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.
[1]
Arie Shoshani,et al.
The MPO API: A tool for recording scientific workflows
,
2014
.
[2]
Arie Shoshani,et al.
Automated metadata, provenance cataloging and navigable interfaces: Ensuring the usefulness of extreme-scale data
,
2014
.
[3]
Samantha S. Foley,et al.
Advances in simulation of wave interactions with extended MHD phenomena
,
2009
.
[4]
David P. Schissel,et al.
Enhanced Computational Infrastructure for Data Analysis at the DIII-D National Fusion Facility
,
1999
.
[5]
Orso Meneghini,et al.
Integrated Modeling of Tokamak Experiments with OMFIT
,
2013
.
[6]
J. Stillerman,et al.
A metadata catalog for organization and systemization of fusion simulation data
,
2012
.