Komadu: A Capture and Visualization System for Scientific Data Provenance

Data provenance captured from scientific applications is a critical precursor to data sharing and reuse. For researchers wanting to repurpose data, it is a source of information about the lineage and attribution of the data and this is needed in order to establish trust in a data set. Komadu is a standalone provenance capture and visualization system for capturing, representing, and manipulating provenance coming from scientific tools, infrastructures, and repositories. It uses the W3C PROV standard [1] in representing data, and it is the successor of the Karma [2] provenance capture system which was based on Open Provenance Model (OPM) [3]. Komadu comes with two different interfaces: a Web Services interface based on Apache Axis2 [4] and a messaging interface based on RabbitMQ [5]. Komadu is completely open source and the source code is publicly available on GitHub [6]. Even though Komadu has been used most extensively in relation to scientific research, its interfaces are designed to collect and visualize provenance of any kind of application needing provenance.