Extracting useful knowledge from event logs: A frequent itemset mining approach

Abstract Business process analysis is a key activity that aims at increasing the efficiency of business operations. In recent years, several data mining based methods have been designed for discovering interesting patterns in event logs. A popular type of methods consists of applying frequent itemset mining to extract patterns indicating how resources and activities are frequently used. Although these methods are useful, they have two important limitations. First, these methods are designed to be applied to original event logs. Because these methods do not consider other perspectives on the data that could be obtained by applying data transformations, many patterns are missed that may represent important information for businesses. Second, these methods can generate a large number of patterns since they only consider the minimum support as constraint to select patterns. But analyzing a large number of patterns is time-consuming for users, and many irrelevant patterns may be found. To address these issues, this paper presents an improved event log analysis approach named AllMining. It includes a novel pre-processing method to construct multiple types of transaction databases from a same original event log using transformations. This allows to extract many new useful types of patterns from event logs with frequent itemset mining techniques. To address the second issue, a pruning strategy is further developed based on a novel concept of pattern coverage, to present a small set of patterns that covers many events to decision makers. Results of experiments on real-life event logs show that the proposed approach is promising compared to existing frequent itemset mining approaches and state-of-the-art process model algorithms.

[1]  Wil M. P. van der Aalst,et al.  Discovery of Frequent Episodes in Event Logs , 2014, SIMPDA.

[2]  Wil M. P. van der Aalst,et al.  User-guided discovery of declarative process models , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[3]  Huilong Duan,et al.  Mining association rules to support resource allocation in business process management , 2011, Expert Syst. Appl..

[4]  Christine W. Chan,et al.  Artificial intelligence for monitoring and supervisory control of process systems , 2007, Eng. Appl. Artif. Intell..

[5]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  Sander J. J. Leemans,et al.  Scalable process discovery and conformance checking , 2016, Software & Systems Modeling.

[7]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[8]  Boudewijn F. van Dongen,et al.  Replaying history on process models for conformance checking and performance analysis , 2012, WIREs Data Mining Knowl. Discov..

[9]  W. Art Chaovalitwongse,et al.  An Efficient Time Series Subsequence Pattern Mining and Prediction Framework with an Application to Respiratory Motion Prediction , 2016, AAAI.

[10]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[11]  Alexander L. Wolf,et al.  Discovering models of software processes from event-based data , 1998, TSEM.

[12]  Wil M. P. van der Aalst,et al.  A Rule-Based Approach for Process Discovery: Dealing with Noise and Imbalance in Process Logs , 2005, Data Mining and Knowledge Discovery.

[13]  Habiba Drias,et al.  Pruning irrelevant association rules using knowledge mining , 2014, Int. J. Bus. Intell. Data Min..

[14]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Heungmo Ryang,et al.  Monitoring vehicle outliers based on clustering technique , 2016, Appl. Soft Comput..

[16]  Boudewijn F. van Dongen,et al.  Workflow mining: A survey of issues and approaches , 2003, Data Knowl. Eng..

[17]  Zhengxing Huang,et al.  Radiology information system: a workflow-based approach , 2009, International Journal of Computer Assisted Radiology and Surgery.

[18]  Djamel Djenouri,et al.  SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases , 2017, PAKDD.

[19]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[20]  Niketa Gandhi,et al.  A review of the application of data mining techniques for decision making in agriculture , 2016, 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I).

[21]  Philippe Fournier-Viger,et al.  FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy , 2017, Knowledge and Information Systems.

[22]  K. Rameshkuma,et al.  Extracting Association Rules from Hiv Infected Patients’ Treatment Dataset , 2011 .

[23]  A. J. M. M. Weijters,et al.  Flexible Heuristics Miner (FHM) , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[24]  Bernard Kamsu-Foguem,et al.  Mining association rules for the quality improvement of the production process , 2013, Expert Syst. Appl..

[25]  Wil M. P. van der Aalst,et al.  Process mining: a research agenda , 2004, Comput. Ind..

[26]  Kyuseok Shim,et al.  Mining Optimized Association Rules with Categorical and Numeric Attributes , 2002, IEEE Trans. Knowl. Data Eng..

[27]  Marzena Kryszkiewicz,et al.  Representative Association Rules , 1998, PAKDD.

[28]  Wil M. P. van der Aalst,et al.  Process Mining: Overview and Opportunities , 2012, ACM Trans. Manag. Inf. Syst..

[29]  Ahcene Bendjoudi,et al.  Association rules mining using evolutionary algorithms , 2014 .

[30]  Sebastián Ventura,et al.  Reducing gaps in quantitative association rules: A genetic programming free-parameter algorithm , 2014, Integr. Comput. Aided Eng..

[31]  Chengqi Zhang,et al.  Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support , 2009, Expert Syst. Appl..

[32]  Bart Baesens,et al.  Declarative process discovery with evolutionary computing , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[33]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[34]  Bart Baesens,et al.  Robust Process Discovery with Artificial Negative Events , 2009, J. Mach. Learn. Res..

[35]  Zhonghua Ni,et al.  Mining event logs to support workflow resource allocation , 2012, Knowl. Based Syst..

[36]  Wil M. P. van der Aalst,et al.  Efficient Discovery of Understandable Declarative Process Models from Event Logs , 2012, CAiSE.

[37]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[38]  Wil M. P. van der Aalst,et al.  Fuzzy Mining - Adaptive Process Simplification Based on Multi-perspective Metrics , 2007, BPM.

[39]  Bart Baesens,et al.  Active Trace Clustering for Improved Process Discovery , 2013, IEEE Transactions on Knowledge and Data Engineering.