Theoretical Framework for Eliminating Redundancy in Workflows

In this paper we look at combining and compressing a set of workflows, such that computation can be minimized. In this context, we look at two novel theoretical problems with applications in workflow systems and services research, which are duals of each other. The first problem looks at merging the maximum number of vertices in two DAGs (directed acyclic graphs) without creating a cycle. We prove that the dual of this problem is the problem of maximizing the length of the LCS (longest common subsequence) between all pairs of topological orderings of the two DAGs. This formulation generalizes to a new definition of LCS between complex structures like workflows or XML documents, which we call M-LCS. Subsequently, we present a taxonomy of the different kinds of problems in this set, and find the M-LCS solution for a tree and a chain with a dynamic programming algorithm. Along with this theoretical formulation, we implement the algorithms in C++ and run it on representative workflows. We evaluate the performance of the M-LCS algorithm on a set of random workflows and observe that it is substantially better than traditional AI based approaches.