A Co-Training Strategy for Multiple View Clustering in Process Mining

Process mining refers to the discovery, conformance, and enhancement of process models from event logs currently produced by several information systems (e.g. workflow management systems). By tightly coupling event logs and process models, process mining makes it possible to detect deviations, predict delays, support decision making, and recommend process redesigns. Event logs are data sets containing the executions (called traces) of a business process. Several process mining algorithms have been defined to mine event logs and deliver valuable models (e.g. Petri nets) of how logged processes are being executed. However, they often generate spaghetti-like process models, which can be hard to understand. This is caused by the inherent complexity of real-life processes, which tend to be less structured and more flexible than what the stakeholders typically expect. In particular, spaghetti-like process models are discovered when all possible behaviors are shown in a single model as a result of considering the set of traces in the event log all at once.To minimize this problem, trace clustering can be used as a preprocessing step. It splits up an event log into clusters of similar traces, so as to handle variability in the recorded behavior and facilitate process model discovery. In this paper, we investigate a multiple view aware approach to trace clustering, based on a co-training strategy. In an assessment, using benchmark event logs, we show that the presented algorithm is able to discover a clustering pattern of the log, such that related traces result appropriately clustered. We evaluate the significance of the formed clusters using established machine learning and process mining metrics.

[1]  Herna L. Viktor,et al.  Transductive Relational Classification in the Co-training Paradigm , 2012, MLDM.

[2]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[3]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[4]  Stan Matwin,et al.  Email classification with co-training , 2011, CASCON.

[5]  Chenping Hou,et al.  Multiple-View Spectral Embedded Clustering Using a Co-training Approach , 2014 .

[6]  Yong Cheng,et al.  Multiview spectral clustering via ensemble , 2009, 2009 IEEE International Conference on Granular Computing.

[7]  Michelangelo Ceci,et al.  A relational approach to probabilistic classification in a transductive setting , 2009, Eng. Appl. Artif. Intell..

[8]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[9]  Wil M. P. van der Aalst,et al.  Trace Clustering in Process Mining , 2008, Business Process Management Workshops.

[10]  Mark Strembeck,et al.  Influence Factors of Understanding Business Process Models , 2008, BIS.

[11]  Luigi Pontieri,et al.  Discovering expressive process models by clustering log traces , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Cirano Iochpe,et al.  An Incremental Process Mining Approach to Extract Knowledge from Legacy Systems , 2010, 2010 14th IEEE International Enterprise Distributed Object Computing Conference.

[13]  Wil M. P. van der Aalst,et al.  Trace Clustering Based on Conserved Patterns: Towards Achieving Better Process Models , 2009, Business Process Management Workshops.

[14]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[15]  Jianmin Wang,et al.  Mining process models with non-free-choice constructs , 2007, Data Mining and Knowledge Discovery.

[16]  Wil M. P. van der Aalst,et al.  Discovering Hierarchical Process Models Using ProM , 2011, CAiSE Forum.

[17]  Christopher J. C. Burges,et al.  Spectral clustering and transductive learning with multiple views , 2007, ICML '07.

[18]  Philip S. Yu,et al.  Unsupervised learning on k-partite graphs , 2006, KDD '06.

[19]  Meng Wang,et al.  Semisupervised Multiview Distance Metric Learning for Cartoon Synthesis , 2012, IEEE Transactions on Image Processing.

[20]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[21]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[22]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[23]  Wil M. P. van der Aalst,et al.  Context Aware Trace Clustering: Towards Improving Process Mining Results , 2009, SDM.

[24]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[25]  Aristidis Likas,et al.  Convex Mixture Models for Multi-view Clustering , 2009, ICANN.

[26]  Wil M. P. van der Aalst,et al.  Process Mining - Discovery, Conformance and Enhancement of Business Processes , 2011 .

[27]  V. D. Sa Spectral Clustering with Two Views , 2007 .

[28]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[29]  Bart Baesens,et al.  Active Trace Clustering for Improved Process Discovery , 2013, IEEE Transactions on Knowledge and Data Engineering.

[30]  Qian Weining,et al.  Analyzing Popular Clustering Algorithms from Different Viewpoints , 2002 .

[31]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[32]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[33]  Marielba Zacarias,et al.  Approaching Process Mining with Sequence Clustering: Experiments and Findings , 2007, BPM.

[34]  Francesco Folino,et al.  Mining usage scenarios in business processes: Outlier-aware discovery and run-time prediction , 2011, Data Knowl. Eng..

[35]  Philip S. Yu,et al.  A General Model for Multiple View Unsupervised Learning , 2008, SDM.

[36]  Alessandro Soranzo,et al.  MATLAB for Psychologists , 2012 .

[37]  Guillaume Cleuziou,et al.  CoFKM: A Centralized Method for Multiple-View Clustering , 2009, 2009 Ninth IEEE International Conference on Data Mining.