论文信息 - Komadu: A Capture and Visualization System for Scientific Data Provenance

Komadu: A Capture and Visualization System for Scientific Data Provenance

Data provenance captured from scientific applications is a critical precursor to data sharing and reuse. For researchers wanting to repurpose data, it is a source of information about the lineage and attribution of the data and this is needed in order to establish trust in a data set. Komadu is a standalone provenance capture and visualization system for capturing, representing, and manipulating provenance coming from scientific tools, infrastructures, and repositories. It uses the W3C PROV standard [1] in representing data, and it is the successor of the Karma [2] provenance capture system which was based on Open Provenance Model (OPM) [3]. Komadu comes with two different interfaces: a Web Services interface based on Apache Axis2 [4] and a messaging interface based on RabbitMQ [5]. Komadu is completely open source and the source code is publicly available on GitHub [6]. Even though Komadu has been used most extensively in relation to scientific research, its interfaces are designed to collect and visualize provenance of any kind of application needing provenance.

[1] Yolanda Gil,et al. PROV-DM: The PROV Data Model , 2013 .

[2] Geoffrey C. Fox,et al. Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[3] Yogesh L. Simmhan,et al. The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[4] Frederic P. Miller,et al. Apache Maven , 2010 .

[5] Sean Bechhofer,et al. Research Objects: Towards Exchange and Reuse of Digital Knowledge , 2010 .

[6] Yogesh L. Simmhan,et al. A Framework for Collecting Provenance in Data-Centric Scientific Workflows , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).