Conformance checking and diagnosis in process mining

In the last decades, the capability of information systems to generate and record overwhelming amounts of event data has experimented an exponential growth in several domains, and in particular in industrial scenarios. Devices connected to the internet (internet of things), social interaction, mobile computing, and cloud computing provide new sources of event data and this trend will continue in the next decades. The omnipresence of large amounts of event data stored in logs is an important enabler for process mining, a novel discipline for addressing challenges related to business process management, process modeling, and business intelligence. Process mining techniques can be used to discover, analyze and improve real processes, by extracting models from observed behavior. The capability of these models to represent the reality determines the quality of the results obtained from them, conditioning its usefulness. Conformance checking is the aim of this thesis, where modeled and observed behavior are analyzed to determine if a model defines a faithful representation of the behavior observed a the log. Most of the efforts in conformance checking have focused on measuring and ensuring that models capture all the behavior in the log, i.e., fitness. Other properties, such as ensuring a precise model (not including unnecessary behavior) have been disregarded. The first part of the thesis focuses on analyzing and measuring the precision dimension of conformance, where models describing precisely the reality are preferred to overly general models. The thesis includes a novel technique based on detecting escaping arcs, i.e., points where the modeled behavior deviates from the one reflected in log. The detected escaping arcs are used to determine, in terms of a metric, the precision between log and model, and to locate possible actuation points in order to achieve a more precise model. The thesis also presents a confidence interval on the provided precision metric, and a multi-factor measure to assess the severity of the detected imprecisions. Checking conformance can be time consuming for real-life scenarios, and understanding the reasons behind the conformance mismatches can be an effort-demanding task. The second part of the thesis changes the focus from the precision dimension to the fitness dimension, and proposes the use of decomposed techniques in order to aid in checking and diagnosing fitness. The proposed approach is based on decomposing the model into single entry single exit components. The resulting fragments represent subprocesses within the main process with a simple interface with the rest of the model. Fitness checking per component provides well-localized conformance information, aiding on the diagnosis of the causes behind the problems. Moreover, the relations between components can be exploded to improve the diagnosis capabilities of the analysis, identifying areas with a high degree of mismatches, or providing a hierarchy for a zoom-in zoom-out analysis. Finally, the thesis proposed two main applications of the decomposed approach. First, the theory proposed is extended to incorporate data information for fitness checking in a decomposed manner. Second, a real-time event-based framework is presented for monitoring fitness.