Managing Provenance for Medical Datasets - An Example Case for Documenting the Workflow for Image Processing

In this paper, we present a novel data repository architecture that is capable of handling the complex image processing workflows and its associated provenance for clinical image data. This novel system has unique and outstanding properties versus existing systems. Among the most relevant features are a flexible and intuitively usable data and metadata management that includes the use of a graph-based provenance management strategy based on a standard provenance model. Annotation is supported to allow for flexible text descriptors as being widespread found for clinical data when structured templates are not yet available. The architecture presented here is based on a modern database and management concepts and allows to overcome the limitations of current systems namely limited provenance support, lacking flexibility, and extensibility to novel requests. To demonstrate the practical applicability of our architecture, we consider a use case of automated image data processing workflow for identifying vascular lesions in the lower extremities, and describe the provenance graph generated for this workflow. Although presented for image data, the proposed concept applies to more general context of arbitrary clinical data and could serve as an additional service to existing clinical IT systems.

[1]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[2]  Jose Sandoval RESTful Java Web Services , 2009 .

[3]  Amnon Shabo,et al.  Model Formulation: HL7 Clinical Document Architecture, Release 2 , 2006, J. Am. Medical Informatics Assoc..

[4]  Jürgen Hesser,et al.  An Optimized Generic Client Service API for Managing Large Datasets within a Data Repository , 2015, 2015 IEEE First International Conference on Big Data Computing Service and Applications.

[5]  Zhengxing Huang,et al.  Radiology information system: a workflow-based approach , 2009, International Journal of Computer Assisted Radiology and Surgery.

[6]  Bertram Ludäscher,et al.  Provenance in Scientific Workflow Systems , 2007, IEEE Data Eng. Bull..

[7]  P. Mildenberger,et al.  Introduction to the DICOM standard , 2002, European Radiology.

[8]  E. James Whitehead,et al.  HTTP Extensions for Distributed Authoring - WEBDAV , 1999, RFC.

[9]  Jürgen Hesser,et al.  Graph-Matching Based CTA , 2009, IEEE Transactions on Medical Imaging.

[10]  S. Schoenberg,et al.  Graph-matching-based computed tomography angiography in peripheral arterial occlusive disease. , 2010, Clinical imaging.

[11]  Simon Miles,et al.  Provenance in Agent-Mediated Healthcare Systems , 2006, IEEE Intelligent Systems.

[12]  Nick Qi Zhu,et al.  Data Visualization with D3.js Cookbook , 2013 .

[13]  P Marcheschi,et al.  A new approach to affordable and reliable cardiology PACS architecture using open-source technology , 2009, 2009 36th Annual Computers in Cardiology Conference (CinC).

[14]  Umit Topaloglu,et al.  EIR: Enterprise imaging repository, an alternative imaging archiving and communication system , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[15]  Yong Zhao,et al.  Applying the Virtual Data Provenance Model , 2006, IPAW.

[16]  Steve G Langer PACS and Digital Medicine: Essential Principles and Modern Practice. , 2012, Medical physics.

[17]  Michael Gertz,et al.  Prov2ONE: An Algorithm for Automatically Constructing ProvONE Provenance Graphs , 2016, IPAW.

[18]  Richard McClatchey,et al.  Experiences of Engineering Grid-Based Medical Software , 2007, Int. J. Medical Informatics.

[19]  Timothy Clark,et al.  Open Annotation Data Model , 2013 .