Multi-Dimensional Event Data in Graph Databases

Process event data is usually stored either in a sequential process event log or in a relational database. While the sequential, single-dimensional nature of event logs aids querying for (sub)sequences of events based on temporal relations such as "directly/eventually-follows", it does not support querying multi-dimensional event data of multiple related entities. Relational databases allow storing multi-dimensional event data but existing query languages do not support querying for sequences or paths of events in terms of temporal relations. In this paper, we propose a general data model for multi-dimensional event data based on labeled property graphs that allows storing structural and temporal relations in a single, integrated graph-based data structure in a systematic way. We provide semantics for all concepts of our data model, and generic queries for modeling event data over multiple entities that interact synchronously and asynchronously . The queries allow for efficiently converting large real-life event data sets into our data model and we provide 5 converted data sets for further research. We show that typical and advanced queries for retrieving and aggregating such multidimensional event data can be formulated and executed efficiently in the existing query language Cypher, giving rise to several new research questions. Specifically aggregation queries on our data model enable process mining over multiple interrelated entities using off-the-shelf technology.

[1]  Paul Grefen,et al.  Enabling efficient process mining on large data sets: realizing an in-database process mining operator , 2019, Distributed and Parallel Databases.

[2]  Dirk Fahland,et al.  Storing and Querying Multi-dimensional Process Event Logs Using Graph Databases , 2019, Business Process Management Workshops.

[3]  John Domingue,et al.  Semantic enabled complex event language for business process monitoring , 2009, SBPM '09.

[4]  Dirk Fahland,et al.  Describing Behavior of Processes with Many-to-Many Interactions , 2019, Petri Nets.

[5]  Bertram Ludäscher,et al.  Modeling and Querying Scientific Workflow Provenance in the D-OPM , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[6]  Xiaojie Yuan,et al.  Answering regular path queries on workflow provenance , 2014, 2015 IEEE 31st International Conference on Data Engineering.

[7]  Jan Mendling,et al.  Log-Based Understanding of Business Processes through Temporal Logic Query Checking , 2014, OTM Conferences.

[8]  Boudewijn F. van Dongen,et al.  DB-XES: Enabling Process Discovery in the Large , 2016, SIMPDA.

[9]  Alessio Bottrighi,et al.  Trace retrieval for business process operational support , 2016, Expert Syst. Appl..

[10]  Doina Caragea,et al.  Graph Databases , 2019, Encyclopedia of Big Data Technologies.

[11]  Dirk Fahland,et al.  Using graph data structures for event logs , 2019 .

[12]  Stefan Plantikow,et al.  Cypher: An Evolving Query Language for Property Graphs , 2018, SIGMOD Conference.

[13]  Marco Pegoraro,et al.  Discovering Process Models from Uncertain Event Data , 2019, Business Process Management Workshops.

[14]  Wil M. P. van der Aalst,et al.  Extracting Object-Centric Event Logs to Support Process Mining on Databases , 2018, CAiSE Forum.

[15]  Jianwen Su,et al.  Querying Workflow Logs , 2018, Inf..

[16]  Hajo A. Reijers,et al.  Everything You Always Wanted to Know About Your Process, but Did Not Know How to Ask , 2016, Business Process Management Workshops.

[17]  Wil M. P. van der Aalst,et al.  Object-Centric Process Mining: Dealing with Divergence and Convergence in Event Data , 2019, SEFM.

[18]  C. Humby,et al.  Process Mining: Data science in Action , 2014 .

[19]  Owen Molloy,et al.  Integration of Event Data from Heterogeneous Systems to Support Business Process Analysis , 2012, IC3K.

[20]  Jianmin Wang,et al.  Querying Process Models Based on the Temporal Relations between Tasks , 2011, 2011 IEEE 15th International Enterprise Distributed Object Computing Conference Workshops.

[21]  Marlon Dumas,et al.  Split miner: automated discovery of accurate and simple business process models from event logs , 2019, Knowledge and Information Systems.

[22]  Pnina Soffer,et al.  From Relational Database to Event Log: Decisions with Quality Impact , 2017, Business Process Management Workshops.

[23]  Viara Popova,et al.  Artifact Lifecycle Discovery , 2013, Int. J. Cooperative Inf. Syst..

[24]  Michael Werner,et al.  Multilevel Process Mining for Financial Audits , 2015, IEEE Transactions on Services Computing.

[25]  Hajo A. Reijers,et al.  Redo Log Process Mining in Real Life: Data Challenges & Opportunities , 2017, Business Process Management Workshops.

[26]  Daniel Deutch,et al.  TOP-K projection queries for probabilistic business processes , 2009, ICDT '09.

[27]  Massimo Mecella,et al.  Automated Discovery of Process Models from Event Logs: Review and Benchmark , 2017, IEEE Transactions on Knowledge and Data Engineering.

[28]  Boudewijn F. van Dongen,et al.  Process mining: a two-step approach to balance between underfitting and overfitting , 2008, Software & Systems Modeling.

[29]  Dirk Fahland,et al.  Conformance Checking Based on Partially Ordered Event Data , 2014, Business Process Management Workshops.

[30]  Jan Mendling,et al.  Efficient and Customisable Declarative Process Mining with SQL , 2016, CAiSE.

[31]  Dirk Fahland,et al.  Discovering Interacting Artifacts from ERP Systems , 2015, IEEE Transactions on Services Computing.

[32]  E. Gonzalez Lopez de Murillas Process mining on databases: extracting event data from real-life data sources , 2019 .

[33]  Dirk Fahland,et al.  Handling Duplicated Tasks in Process Discovery by Refining Event Labels , 2016, BPM.

[34]  Hajo A. Reijers,et al.  Discovering Social Networks from Event Logs , 2005, Computer Supported Cooperative Work (CSCW).

[35]  Bart Baesens,et al.  A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs , 2012, Inf. Syst..

[36]  Sherif Sakr,et al.  A Query Language for Analyzing Business Processes Execution , 2011, BPM.

[37]  Hajo A. Reijers,et al.  Connecting databases with process mining: a meta model and toolset , 2016, Software & Systems Modeling.

[38]  Alessandro Berti,et al.  Extracting Multiple Viewpoint Models from Relational Databases , 2018, SIMPDA.