An Approach for Incorporating Expert Knowledge in Trace Clustering

Trace clustering techniques are a set of approaches for partitioning traces or process instances into similar groups. Typically, this partitioning is based on certain patterns or similarity between the traces, or done by discovering a process model for each cluster of traces. In general, however, it is likely that clustering solutions obtained by these approaches will be hard to understand or difficult to validate given an expert’s domain knowledge. Therefore, we propose a novel semi-supervised trace clustering technique based on expert knowledge. Our approach is validated using a case in tablet reading behaviour, but widely applicable in other contexts. In an experimental evaluation, the technique is shown to provide a beneficial trade-off between performance and understandability.

[1]  Wil M. P. van der Aalst,et al.  Trace Clustering Based on Conserved Patterns: Towards Achieving Better Process Models , 2009, Business Process Management Workshops.

[2]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[3]  Donato Malerba,et al.  A Co-Training Strategy for Multiple View Clustering in Process Mining , 2016, IEEE Transactions on Services Computing.

[4]  Wil M. P. van der Aalst,et al.  Context Aware Trace Clustering: Towards Improving Process Mining Results , 2009, SDM.

[5]  Josep Carmona,et al.  A Fresh Look at Precision in Process Conformance , 2010, BPM.

[6]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Event Logs - A Constructive Approach , 2013, Petri Nets.

[7]  Seppe K. L. M. vanden Broucke,et al.  Explaining clusterings of process instances , 2016, Data Mining and Knowledge Discovery.

[8]  Jochen De Weerdt,et al.  Multi-objective Trace Clustering: Finding More Balanced Solutions , 2016, Business Process Management Workshops.

[9]  Evangelos Grigoroudis,et al.  Supporting healthcare management decisions via robust clustering of event logs , 2015, Knowl. Based Syst..

[10]  Bart Baesens,et al.  A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs , 2012, Inf. Syst..

[11]  Wil M. P. van der Aalst,et al.  Trace Clustering in Process Mining , 2008, Business Process Management Workshops.

[12]  Sander J. J. Leemans,et al.  PM ^2 : A Process Mining Project Methodology , 2015, CAiSE.

[13]  Bart Baesens,et al.  A robust F-measure for evaluating discovered process models , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[14]  Boudewijn F. van Dongen,et al.  Replaying history on process models for conformance checking and performance analysis , 2012, WIREs Data Mining Knowl. Discov..

[15]  Bart Baesens,et al.  Active Trace Clustering for Improved Process Discovery , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[17]  Ana L. N. Fred,et al.  Cluster Ensemble Methods: from Single Clusterings to Combined Solutions , 2008 .

[18]  Bart Baesens,et al.  Robust Process Discovery with Artificial Negative Events , 2009, J. Mach. Learn. Res..