Data Quality Issues

Healthcare data, like any data, may have all kinds of quality problems. In this chapter, we identify 27 data quality issues that may compromise the validity of process mining results. Examples are missing data, incorrect data, imprecise data, and irrelevant data. For example, an event may only have a date (e.g., 15-6-2015) and not a fine-grained timestamp. As a result, the ordering of events is unknown, thus complicating analysis. Practitioners were interviewed to estimate the frequency of the 27 types of data quality issues identified. This provides insights into typical problems that may arise in data-science projects in hospitals. The quality of the analysis results directly depends on the input data (i.e., Garbage-In Garbage-Out). Therefore, the chapter also discusses 12 guidelines for logging. These guidelines should be used when developing the next generation of hospital information systems. Improved event logs will enable more advanced forms of process mining related to prediction and recommendation.