Conformance Checking Using Activity and Trace Embeddings

Conformance checking describes process mining techniques used to compare an event log and a corresponding process model. In this paper, we propose an entirely new approach to conformance checking based on neural network-based embeddings. These embeddings are vector representations of every activity/task present in the model and log, obtained via act2vec, a Word2vec based model. Our novel conformance checking approach applies the Word Mover’s Distance to the activity embeddings of traces in order to measure fitness and precision. In addition, we investigate a more efficiently calculated lower bound of the former metric, i.e. the Iterative Constrained Transfers measure. An alternative method using trace2vec, a Doc2vec based model, to train and compare vector representations of the process instances themselves is also introduced. These methods are tested in different settings and compared to other conformance checking techniques, showing promising results.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[3]  Josep Carmona,et al.  A Fresh Look at Precision in Process Conformance , 2010, BPM.

[4]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[5]  Niek Tax,et al.  Evaluating Conformance Measures in Process Mining using Conformance Propositions (Extended version) , 2019, Trans. Petri Nets Other Model. Concurr..

[6]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Event Logs Containing Infrequent Behaviour , 2013, Business Process Management Workshops.

[7]  Alexander L. Wolf,et al.  Software process validation: quantitatively measuring the correspondence of a process to a model , 1999, TSEM.

[8]  Martin Matzner,et al.  Conformance checking: a state-of-the-art literature review , 2019, S-BPM ONE '19.

[9]  Sander J. J. Leemans,et al.  Earth Movers' Stochastic Conformance Checking , 2019, BPM Forum.

[10]  Michael Werman,et al.  A Linear Time Histogram Metric for Improved SIFT Matching , 2008, ECCV.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Mathias Weske,et al.  Process compliance analysis based on behavioural profiles , 2011, Inf. Syst..

[13]  Bart Baesens,et al.  A comprehensive benchmarking framework (CoBeFra) for conformance analysis between procedural process models and event logs in ProM , 2013, 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[14]  Bart Baesens,et al.  Robust Process Discovery with Artificial Negative Events , 2009, J. Mach. Learn. Res..

[15]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[16]  Kubilay Atasu,et al.  Linear-Complexity Data-Parallel Earth Mover's Distance Approximations , 2019, ICML.

[17]  Boudewijn F. van Dongen,et al.  Replaying history on process models for conformance checking and performance analysis , 2012, WIREs Data Mining Knowl. Discov..

[18]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[19]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[20]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[21]  Marlon Dumas,et al.  Measuring Fitness and Precision of Automatically Discovered Process Models: A Principled and Scalable Approach , 2022, IEEE Transactions on Knowledge and Data Engineering.

[22]  Alessandro Berti,et al.  Process Mining for Python (PM4Py): Bridging the Gap Between Process- and Data Science , 2019, ArXiv.

[23]  Wil M. P. van der Aalst,et al.  Conformance checking of processes based on monitoring real behavior , 2008, Inf. Syst..

[24]  Boudewijn F. van Dongen,et al.  Alignment Based Precision Checking , 2012, Business Process Management Workshops.

[25]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  Wil M. P. van der Aalst,et al.  Process Mining , 2016, Springer Berlin Heidelberg.

[27]  Seppe K. L. M. vanden Broucke,et al.  act2vec, trace2vec, log2vec, and model2vec: Representation Learning for Business Processes , 2018, BPM.

[28]  Bart Baesens,et al.  A robust F-measure for evaluating discovered process models , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[29]  Boudewijn F. van Dongen,et al.  The ProM Framework: A New Era in Process Mining Tool Support , 2005, ICATPN.

[30]  Michael Werman,et al.  Fast and robust Earth Mover's Distances , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[32]  Benoît Depaire,et al.  PTandLogGenerator: A Generator for Artificial Event Data , 2016, BPM.

[33]  Boudewijn F. van Dongen,et al.  Process Discovery using Integer Linear Programming , 2009, Fundam. Informaticae.