LabelFlow: Exploiting Workflow Provenance to Surface Scientific Data Provenance

Provenance traces captured by scientific workflows can be useful for designing, debugging and maintenance. However, our experience suggests that they are of limited use for reporting results, in part because traces do not comprise domain-specific annotations needed for explaining results, and the black-box nature of some workflow activities. We show that by basic mark-up of the data processing within activities and using a set of domain specific label generation functions, standard workflow provenance can be utilised as a platform for the labelling of data artefacts. These labels can in turn aid selection of data subsets and proxy for data descriptors for shared datasets.

[1]  Bertram Ludäscher,et al.  A Calculus for Propagating Semantic Annotations Through Scientific Workflow Queries , 2006, EDBT Workshops.

[2]  Jeffrey M. Bradshaw,et al.  Applying KAoS Services to Ensure Policy Compliance for Semantic Web Services Workflow Composition and Enactment , 2004, SEMWEB.

[3]  James Cheney,et al.  Provenance as dependency analysis† , 2007, Mathematical Structures in Computer Science.

[4]  Wang Chiew Tan,et al.  An annotation management system for relational databases , 2004, The VLDB Journal.

[5]  Jef Wijsen,et al.  Current Trends in Database Technology - EDBT 2006, EDBT 2006 Workshops PhD, DataX, IIDB, IIHA, ICSNW, QLQP, PIM, PaRMA, and Reactivity on the Web, Munich, Germany, March 26-31, 2006, Revised Selected Papers , 2006, EDBT Workshops.

[6]  Carole A. Goble,et al.  Using Semantic Web Technologies for Representing E-science Provenance , 2004, SEMWEB.

[7]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[8]  Daniel Deutch,et al.  Putting Lipstick on Pig: Enabling Database-style Workflow Provenance , 2011, Proc. VLDB Endow..

[9]  Jennifer Widom,et al.  Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[10]  Bertram Ludäscher,et al.  D-PROV: Extending the PROV Provenance Model with Workflow Structure , 2013, TaPP.

[11]  Ian Foster,et al.  The First Provenance Challenge , 2008 .

[12]  Sebastian Maneth,et al.  Efficient Memory Representation of XML Documents , 2005, DBPL.

[13]  Devarshi Ghoshal,et al.  Static compiler analysis for workflow provenance , 2013, WORKS@SC.

[14]  H. C. Liu,et al.  Using Semantic Web Technologies with OPeNDAP , 2010 .

[15]  Carole A. Goble,et al.  On assisting scientific data curation in collection-based dataflows using labels , 2013, WORKS@SC.

[16]  Oscar Corcho,et al.  Workflow-centric research objects: First class citizens in scholarly discourse. , 2012, ESWC 2012.

[17]  Nigel W. Hardy,et al.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project , 2008, Nature Biotechnology.

[18]  Cláudio T. Silva,et al.  CrowdLabs: Social Analysis and Visualization for the Sciences , 2011, SSDBM.

[19]  Amit P. Sheth,et al.  Extending Semantic Provenance into the Web of Data , 2011, IEEE Internet Computing.

[20]  Amit P. Sheth,et al.  Janus: From Workflows to Semantic Provenance and Linked Open Data , 2010, IPAW.

[21]  Paul T. Groth,et al.  Wings: Intelligent Workflow-Based Design of Computational Experiments , 2011, IEEE Intelligent Systems.

[22]  Simon Miles Automatically Adapting Source Code to Document Provenance , 2010, IPAW.

[23]  Ian Foster,et al.  Special Issue: The First Provenance Challenge , 2008 .

[24]  Amit P. Sheth,et al.  Semantic Provenance for eScience: Managing the Deluge of Scientific Data , 2008, IEEE Internet Computing.

[25]  C. Tenopir,et al.  Data Sharing by Scientists: Practices and Perceptions , 2011, PloS one.

[26]  Marianne Winslett,et al.  Scientific and Statistical Database Management, 21st International Conference, SSDBM 2009, New Orleans, LA, USA, June 2-4, 2009, Proceedings , 2009, SSDBM.

[27]  Carole A. Goble,et al.  Common motifs in scientific workflows: An empirical analysis , 2012, 2012 IEEE 8th International Conference on E-Science.

[28]  Carole A. Goble,et al.  Taverna, Reloaded , 2010, SSDBM.

[29]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.