Mining Program Workflow from Interleaved Logs

Successful software maintenance is becoming increasingly critical due to the increasing dependence of our society and economy on software systems. One key problem of software maintenance is the difficulty in understanding the evolving software systems. Program workflows can help system operators and administrators to understand system behaviors and verify system executions so as to greatly facilitate system maintenance. In this paper, we propose an algorithm to automatically discover program workflows from event traces that record system events during system execution. Different from existing workflow mining algorithms, our approach can construct concurrent workflows from traces of interleaved events. Our workflow mining approach is a three-step coarse-to-fine algorithm. At first, we mine temporal dependencies for each pair of events. Then, based on the mined pair-wise temporal dependencies, we construct a basic workflow model by a breadth-first path pruning algorithm. After that, we refine the workflow by verifying it with all training event traces. The refinement algorithm tries to find out a workflow that can interpret all event traces with minimal state transitions and threads. The results of both simulation data and real program data show that our algorithm is highly effective.

[1]  Jerome A. Feldman,et al.  On the Synthesis of Finite-State Machines from Samples of Their Behavior , 1972, IEEE Transactions on Computers.

[2]  Guido Schimm,et al.  Mining exact models of concurrent workflows , 2004, Comput. Ind..

[3]  Domenico Cotroneo,et al.  Investigation of failure causes in workload-driven reliability testing , 2007, SOQUA '07.

[4]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[5]  Wil M. P. van der Aalst,et al.  Process mining: a research agenda , 2004, Comput. Ind..

[6]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7]  Luigi Pontieri,et al.  Mining Expressive Process Models by Clustering Workflow Traces , 2004, PAKDD.

[8]  Qiang Fu,et al.  Mining Invariants from Console Logs for System Problem Detection , 2010, USENIX Annual Technical Conference.

[9]  David Lo,et al.  Automatic steering of behavioral model inference , 2009, ESEC/SIGSOFT FSE.

[10]  Leonardo Mariani,et al.  Automatic generation of software behavioral models , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[11]  Luigi Pontieri,et al.  Mining Constrained Graphs: The Case of Workflow Systems , 2004, Constraint-Based Mining and Inductive Databases.

[12]  Dimitrios Gunopulos,et al.  Mining Process Models from Workflow Logs , 1998, EDBT.

[13]  Mohamed E. Fayad Software Maintenance , 2005, IEEE Softw..

[14]  James R. Larus,et al.  Mining specifications , 2002, POPL '02.

[15]  Alexander L. Wolf,et al.  Discovering models of behavior for concurrent workflows , 2004, Comput. Ind..

[16]  Ricardo Bezerra de Andrade e Silva,et al.  Probabilistic workflow mining , 2005, KDD '05.

[17]  Siau-Cheng Khoo,et al.  QUARK: Empirical Assessment of Automaton-based Specification Miners , 2006, 2006 13th Working Conference on Reverse Engineering.

[18]  Neil Walkinshaw,et al.  Inferring Finite-State Models with Temporal Constraints , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[19]  Leonardo Mariani,et al.  Dynamic Detection of COTS Component Incompatibility , 2007, IEEE Software.

[20]  Siau-Cheng Khoo,et al.  SMArTIC: towards building an accurate, robust and scalable specification miner , 2006, SIGSOFT '06/FSE-14.

[21]  Ying Zou,et al.  Model-driven business process recovery , 2004, 11th Working Conference on Reverse Engineering.

[22]  Anand Raman,et al.  The sk-strings method for inferring PFSA , 1997 .