Healthcare data, like any data, may have all kinds of quality problems. In this chapter, we identify 27 data quality issues that may compromise the validity of process mining results. Examples are missing data, incorrect data, imprecise data, and irrelevant data. For example, an event may only have a date (e.g., 15-6-2015) and not a fine-grained timestamp. As a result, the ordering of events is unknown, thus complicating analysis. Practitioners were interviewed to estimate the frequency of the 27 types of data quality issues identified. This provides insights into typical problems that may arise in data-science projects in hospitals. The quality of the analysis results directly depends on the input data (i.e., Garbage-In Garbage-Out). Therefore, the chapter also discusses 12 guidelines for logging. These guidelines should be used when developing the next generation of hospital information systems. Improved event logs will enable more advanced forms of process mining related to prediction and recommendation.
[1]
Cw Christian Günther,et al.
XES - standard definition
,
2014
.
[2]
Jan Vanthienen,et al.
IEEE Task force on process mining
,
2011
.
[3]
Wil M. P. van der Aalst,et al.
Process Mining - Discovery, Conformance and Enhancement of Business Processes
,
2011
.
[4]
Wil M. P. van der Aalst,et al.
Extracting Event Data from Databases to Unleash Process Mining
,
2015,
BPM.
[5]
Wil M. P. van der Aalst,et al.
Fuzzy Mining - Adaptive Process Simplification Based on Multi-perspective Metrics
,
2007,
BPM.
[6]
van der Wmp Wil Aalst,et al.
Wanna improve process mining results? : it’s high time we consider data quality issues seriously
,
2013
.