Scientific workflows for process mining: building blocks, scenarios, and implementation

Over the past decade process mining has emerged as a new analytical discipline able to answer a variety of questions based on event data. Event logs have a very particular structure; events have timestamps, refer to activities and resources, and need to be correlated to form process instances. Process mining results tend to be very different from classical data mining results, e.g., process discovery may yield end-to-end process models capturing different perspectives rather than decision trees or frequent patterns. A process-mining tool like ProM provides hundreds of different process mining techniques ranging from discovery and conformance checking to filtering and prediction. Typically, a combination of techniques is needed and, for every step, there are different techniques that may be very sensitive to parameter settings. Moreover, event logs may be huge and may need to be decomposed and distributed for analysis. These aspects make it very cumbersome to analyze event logs manually. Process mining should be repeatable and automated. Therefore, we propose a framework to support the analysis of process mining workflows. Existing scientific workflow systems and data mining tools are not tailored towards process mining and the artifacts used for analysis (process models and event logs). This paper structures the basic building blocks needed for process mining and describes various analysis scenarios. Based on these requirements we implemented RapidProM, a tool supporting scientific workflows for process mining. Examples illustrating the different scenarios are provided to show the feasibility of the approach.

[1]  B. F. van Dongen BPI Challenge 2014 , 2014 .

[2]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2003, Distributed and Parallel Databases.

[3]  Daniel A. Keim,et al.  Visual Analytics: Definition, Process, and Challenges , 2008, Information Visualization.

[4]  Wil M. P. van der Aalst,et al.  Applications and Theory of Petri Nets , 1983, Informatik-Fachberichte.

[5]  Dirk Fahland,et al.  Model repair - aligning process models to reality , 2015, Inf. Syst..

[6]  Boudewijn F. van Dongen,et al.  On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery , 2012, OTM Conferences.

[7]  Boudewijn F. van Dongen,et al.  Replaying history on process models for conformance checking and performance analysis , 2012, WIREs Data Mining Knowl. Discov..

[8]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[9]  Wil M. P. van der Aalst,et al.  Mining Social Networks: Uncovering Interaction Patterns in Business Processes , 2004, Business Process Management.

[10]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[11]  Markus Hofmann,et al.  RapidMiner: Data Mining Use Cases and Business Analytics Applications , 2013 .

[12]  Simon Parsons,et al.  Principles of Data Mining by David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 546 pp., £34.50, ISBN 0-262-08290-X , 2004, The Knowledge Engineering Review.

[13]  Tiziana Margaria,et al.  Leveraging Applications of Formal Methods, Verification, and Validation , 2012, Communications in Computer and Information Science.

[14]  Song,et al.  Supporting proces mining by showing events at a glance , 2007 .

[15]  Mathias Weske,et al.  Business Process Management: Concepts, Languages, Architectures , 2007 .

[16]  Boudewijn F. van Dongen,et al.  The ProM Framework: A New Era in Process Mining Tool Support , 2005, ICATPN.

[17]  Wil M. P. van der Aalst,et al.  A Decade of Business Process Management Conferences: Personal Reflections on a Developing Discipline , 2012, BPM.

[18]  Wil M. P. van der Aalst,et al.  Process Mining - Discovery, Conformance and Enhancement of Business Processes , 2011 .

[19]  Wil M. P. van der Aalst,et al.  Diagnosing Workflow Processes using Woflan , 2001, Comput. J..

[20]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[21]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Event Logs - A Constructive Approach , 2013, Petri Nets.

[22]  Ivan Lanese,et al.  Leveraging Applications of Formal Methods, Verification and Validation. Specialized Techniques and Applications , 2014, Lecture Notes in Computer Science.

[23]  Wil M. P. van der Aalst,et al.  Decomposing Process Mining Problems Using Passages , 2012, Petri Nets.

[24]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[25]  August-Wilhelm Scheer,et al.  ARIS Architecture and Reference Models for Business Process Management , 2000, Business Process Management.

[26]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[27]  Bertram Ludäscher,et al.  The International Journal of Digital Curation , 2022 .

[28]  Wil M. P. van der Aalst,et al.  Supporting Process Mining Workflows with RapidProM , 2014, BPM.

[29]  Remco M. Dijkman,et al.  APROMORE: An advanced process model repository , 2011, Expert Syst. Appl..

[30]  Zheng Liu,et al.  A Method to Build and Analyze Scientific Workflows from Provenance through Process Mining , 2011, TaPP.

[31]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[32]  Dimka Karastoyanova,et al.  Bridging the Gap between Business and Scientific Workflows: Humans in the Loop of Scientific Workflows , 2010, 2010 IEEE Sixth International Conference on e-Science.

[33]  Nada Lavrac,et al.  ClowdFlows: A Cloud Based Scientific Workflow Platform , 2012, ECML/PKDD.

[34]  Boudewijn F. van Dongen,et al.  Process Discovery using Integer Linear Programming , 2009, Fundam. Informaticae.

[35]  Wil M. P. van der Aalst,et al.  Decomposing Petri nets for process mining: A generic approach , 2013, Distributed and Parallel Databases.

[36]  Bertram Ludäscher,et al.  Scientific workflow management and the Kepler system: Research Articles , 2006 .

[37]  Carole A. Goble,et al.  myExperiment: a repository and social network for the sharing of bioinformatics workflows , 2010, Nucleic Acids Res..

[38]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2004, Distributed and Parallel Databases.

[39]  Wil M.P. van der Aalst,et al.  Genetic Process Mining , 2005, ICATPN.

[40]  Claudia Diamantini,et al.  Mining usage patterns from a repository of scientific workflows , 2012, SAC '12.

[41]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[42]  Moe Thandar Wynn,et al.  Soundness of workflow nets: classification, decidability, and analysis , 2011, Formal Aspects of Computing.

[43]  Wil M. P. van der Aalst,et al.  Rediscovering workflow models from event-based data using little thumb , 2003, Integr. Comput. Aided Eng..

[44]  Carole A. Goble,et al.  Common motifs in scientific workflows: An empirical analysis , 2012, 2012 IEEE 8th International Conference on E-Science.

[45]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[46]  Boudewijn F. van Dongen,et al.  Alignment Based Precision Checking , 2012, Business Process Management Workshops.

[47]  Tiziana Margaria,et al.  Model-Driven Development with the jABC , 2006, Haifa Verification Conference.

[48]  Wil M. P. van der Aalst,et al.  Configurable Process Models as a Basis for Reference Modeling , 2005, Business Process Management Workshops.

[49]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[50]  Boudewijn F. van Dongen,et al.  Measuring precision of modeled behavior , 2015, Inf. Syst. E Bus. Manag..

[51]  Kenneth J. Turner,et al.  Workflows for quantitative data analysis in the social sciences , 2015, International Journal on Software Tools for Technology Transfer.

[52]  Wil M. P. van der Aalst,et al.  A General Framework for Correlating Business Process Characteristics , 2014, BPM.

[53]  Wil M. P. van der Aalst,et al.  Time prediction based on process mining , 2011, Inf. Syst..

[54]  Anna-Lena Lamprecht,et al.  jABCstats: An Extensible Process Library for the Empirical Analysis of jABC Workflows , 2014, ISoLA.

[55]  Wil M. P. van der Aalst Process mining , 2012, CACM.

[56]  Wil M. P. van der Aalst,et al.  Fuzzy Mining - Adaptive Process Simplification Based on Multi-perspective Metrics , 2007, BPM.

[57]  Marco Roos,et al.  Analysing Scientific Workflows: Why Workflows Not Only Connect Web Services , 2009, 2009 Congress on Services - I.

[58]  Frank Leymann,et al.  Production Workflow: Concepts and Techniques , 1999 .

[59]  M AalstWil,et al.  Scientific workflows for process mining , 2016 .

[60]  Dennis Gannon,et al.  Scientific versus Business Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[61]  Sander J. J. Leemans,et al.  Exploring Processes and Deviations , 2014, Business Process Management Workshops.