Composition and Substitution in Provenance and Workflows

It is generally accepted that any comprehensive provenance model must allow one to describe provenance at various levels of granularity. For example, if we have a provenance graph of a process which has nodes to describe subprocesses, we need a method of expanding these nodes to obtain a more detailed provenance graph. To date, most of the work that has attempted to formalize this notion has adopted a descriptive approach to this issue: for example, given two provenance graphs under what conditions is one "finer grained" than another. In this paper we take an operational approach. For example, given two provenance graphs of interacting processes, what does it mean to compose those graphs? Also, given a provenance graph of a process and a provenance graph of one of its subprocesses, what is the operation of substitution that allows us to expand the graph into a finer-grained graph? As well as provenance graphs, these questions also apply to workflow graphs and other process models that occur in computer science. We propose a model and operations that addresses these problems. While it is only one of a number of possible solutions, it does indicate that a basic adjustment to provenance models is needed if they are properly to accommodate such an operational approach to composition and substitution.