Event stream-based process discovery using abstract representations

The aim of process discovery, originating from the area of process mining, is to discover a process model based on business process execution data. A majority of process discovery techniques relies on an event log as an input. An event log is a static source of historical data capturing the execution of a business process. In this paper, we focus on process discovery relying on online streams of business process execution events. Learning process models from event streams poses both challenges and opportunities, i.e. we need to handle unlimited amounts of data using finite memory and, preferably, constant time. We propose a generic architecture that allows for adopting several classes of existing process discovery techniques in context of event streams. Moreover, we provide several instantiations of the architecture, accompanied by implementations in the process mining toolkit ProM (http://promtools.org). Using these instantiations, we evaluate several dimensions of stream-based process discovery. The evaluation shows that the proposed architecture allows us to lift process discovery to the streaming domain.

[1]  Robin Bergenthum,et al.  Process Mining Based on Regions of Languages , 2007, BPM.

[2]  Boudewijn F. van Dongen,et al.  The ProM Framework: A New Era in Process Mining Tool Support , 2005, ICATPN.

[3]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[4]  Wil M. P. van der Aalst,et al.  RapidProM: Mine Your Processes and Not Just Your Data , 2017, ArXiv.

[5]  A. J. M. M. Weijters,et al.  Flexible Heuristics Miner (FHM) , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[6]  Boudewijn F. van Dongen,et al.  Quality Dimensions in Process Discovery: The Importance of Fitness, Precision, Generalization and Simplicity , 2014, Int. J. Cooperative Inf. Syst..

[7]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Event Logs Containing Infrequent Behaviour , 2013, Business Process Management Workshops.

[8]  Mohamed Medhat Gaber,et al.  Knowledge discovery from data streams , 2009, IDA 2009.

[9]  Alessandro Sperduti,et al.  Online Discovery of Declarative Process Models from Event Streams , 2015, IEEE Transactions on Services Computing.

[10]  Josep Carmona,et al.  Process Discovery Algorithms Using Numerical Abstract Domains , 2014, IEEE Transactions on Knowledge and Data Engineering.

[11]  Wil M. P. van der Aalst,et al.  The Application of Petri Nets to Workflow Management , 1998, J. Circuits Syst. Comput..

[12]  Wil M. P. van der Aalst,et al.  Process Mining , 2016, Springer Berlin Heidelberg.

[13]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Event Logs - A Constructive Approach , 2013, Petri Nets.

[14]  Gordon S. Blair,et al.  Constructs Competition Miner: Process Control-Flow Discovery of BP-Domain Constructs , 2014, BPM.

[15]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[16]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[17]  Boudewijn F. van Dongen,et al.  Data Streams in ProM 6: A Single-node Architecture , 2014, BPM.

[18]  Wil M.P. van der Aalst,et al.  Process mining with the HeuristicsMiner algorithm , 2006 .

[19]  C. Humby,et al.  Process Mining: Data science in Action , 2014 .

[20]  Wil M. P. van der Aalst,et al.  Scientific workflows for process mining: building blocks, scenarios, and implementation , 2015, International Journal on Software Tools for Technology Transfer.

[21]  Alessandro Sperduti,et al.  A Lossy Counting Based Approach for Learning on Streams of Graphs on a Budget , 2013, IJCAI.

[22]  Boudewijn F. van Dongen,et al.  Know What You Stream: Generating Event Streams from CPN Models in ProM 6 , 2015, BPM.

[23]  Tadao Murata,et al.  Petri nets: Properties, analysis and applications , 1989, Proc. IEEE.

[24]  Bart Baesens,et al.  A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs , 2012, Inf. Syst..

[25]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[26]  Thomas Seidl,et al.  Efficient Process Discovery From Event Streams Using Sequential Pattern Mining , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[27]  Boudewijn F. van Dongen,et al.  Process Mining: Overview and Outlook of Petri Net Discovery Algorithms , 2009, Trans. Petri Nets Other Model. Concurr..

[28]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[29]  Lars Michael Kristensen,et al.  Coloured Petri Nets - Modelling and Validation of Concurrent Systems , 2009 .

[30]  Boudewijn F. van Dongen,et al.  Avoiding Over-Fitting in ILP-Based Process Discovery , 2015, BPM.

[31]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[32]  Opher Etzion,et al.  Event Processing in Action , 2010 .

[33]  Mykola Pechenizkiy,et al.  Dealing With Concept Drifts in Process Mining , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Lars Michael Kristensen,et al.  Coloured Petri Nets and CPN Tools for modelling and validation of concurrent systems , 2007, International Journal on Software Tools for Technology Transfer.

[35]  Gordon S. Blair,et al.  Scalable Dynamic Business Process Discovery with the Constructs Competition Miner , 2014, SIMPDA.

[36]  Boudewijn F. van Dongen,et al.  Process mining: a two-step approach to balance between underfitting and overfitting , 2008, Software & Systems Modeling.

[37]  Boudewijn F. van Dongen,et al.  Process Mining for Ubiquitous Mobile Systems: An Overview and a Concrete Algorithm , 2004, UMICS.

[38]  Sander J. J. Leemans,et al.  Scalable Process Discovery with Guarantees , 2015, BMMDS/EMMSAD.

[39]  Wil M. P. van der Aalst,et al.  Fuzzy Mining - Adaptive Process Simplification Based on Multi-perspective Metrics , 2007, BPM.

[40]  Divesh Srivastava,et al.  Forward Decay: A Practical Time Decay Model for Streaming Systems , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[41]  Marios Hadjieleftheriou,et al.  Methods for finding frequent items in data streams , 2010, The VLDB Journal.

[42]  Charu C. Aggarwal,et al.  On biased reservoir sampling in the presence of stream evolution , 2006, VLDB.

[43]  Alessandro Sperduti,et al.  Heuristics Miners for Streaming Event Data , 2012, ArXiv.

[44]  Tao Li,et al.  Event Mining: Algorithms and Applications , 2015 .

[45]  Richard Granger,et al.  Beyond Incremental Processing: Tracking Concept Drift , 1986, AAAI.

[46]  Boudewijn F. van Dongen,et al.  Process Discovery using Integer Linear Programming , 2009, Fundam. Informaticae.

[47]  Wil M. P. van der Aalst,et al.  Rediscovering workflow models from event-based data using little thumb , 2003, Integr. Comput. Aided Eng..