Mining and reasoning on workflows

Today's workflow management systems represent a key technological infrastructure for advanced applications that is attracting a growing body of research, mainly focused in developing tools for workflow management, that allow users both to specify the "static" aspects, like preconditions, precedences among activities, and rules for exception handling, and to control its execution by scheduling the activities on the available resources. This paper deals with an aspect of workflows which has so far not received much attention even though it is crucial for the forthcoming scenarios of large scale applications on the Web: providing facilities for the human system administrator for identifying the choices performed more frequently in the past that had lead to a desired final configuration. In this context, we formalize the problem of discovering the most frequent patterns of executions, i.e., the workflow substructures that have been scheduled more frequently by the system. We attacked the problem by developing two data mining algorithms on the basis of an intuitive and original graph formalization of a workflow schema and its occurrences. The model is used both to prove some intractability results that strongly motivate the use of data mining techniques and to derive interesting structural properties for reducing the search space for frequent patterns. Indeed, the experiments we have carried out show that our algorithms outperform standard data mining algorithms adapted to discover frequent patterns of workflow executions.

[1]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[2]  Neil D. Jones,et al.  Complete problems for deterministic polynomial time , 1974, STOC '74.

[3]  Wil M. P. van der Aalst,et al.  The Application of Petri Nets to Workflow Management , 1998, J. Circuits Syst. Comput..

[4]  Philip S. Yu,et al.  Mining asynchronous periodic patterns in time series data , 2000, KDD '00.

[5]  Amit P. Sheth,et al.  An overview of workflow management: From process modeling to workflow automation infrastructure , 1995, Distributed and Parallel Databases.

[6]  Boudewijn F. van Dongen,et al.  Workflow mining: A survey of issues and approaches , 2003, Data Knowl. Eng..

[7]  Gerhard Weikum,et al.  A Formal Foundation for Distributed Workflow Execution Based on State Charts , 1997, ICDT.

[8]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[9]  Hannu Toivonen,et al.  Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[10]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[11]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[12]  Wil M. P. van der Aalst,et al.  Advanced Workflow Patterns , 2000, CoopIS.

[13]  Anthony J. Bonner,et al.  Workflow, transactions and datalog , 1999, PODS.

[14]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[15]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[16]  Gerti Kappel,et al.  A framework for workflow management systems based on objects, rules and roles , 2000, CSUR.

[17]  Kees M. van Hee,et al.  Workflow Management: Models, Methods, and Systems , 2002, Cooperative information systems.

[18]  Munindar P. Singh Semantical Considerations on Workflows: An Algebra for Intertask Dependencies , 1995, DBPL.

[19]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[20]  Gerhard Weikum,et al.  The Mentor project: steps towards enterprise-wide workflow management , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[21]  Ismail Hakki Toroslu,et al.  A Logical Framework for Scheduling Workflows under Resource Allocation Constraints , 2002, VLDB.

[22]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[23]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[24]  Asuman Dogac,et al.  Workflow history management , 1998, SGMD.

[25]  C. R. Ramakrishnan,et al.  Logic based modeling and analysis of workflows , 1998, PODS '98.

[26]  Wil M. P. van der Aalst,et al.  An Alternative Way to Analyze Workflow Graphs , 2002, CAiSE.

[27]  Alexander L. Wolf,et al.  Automating Process Discovery through Event-Data Analysis , 1995, 1995 17th International Conference on Software Engineering.

[28]  Gustavo Alonso,et al.  Atomicity and isolation for transactional processes , 2002, TODS.

[29]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[30]  Dimitrios Gunopulos,et al.  Mining Process Models from Workflow Logs , 1998, EDBT.

[31]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[32]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.