Responsible Process Mining - A Data Quality Perspective

Modern organisations consider data to be their lifeblood. The potential benefits of data-driven analyses include a better understanding of business performance and more-informed decision making for business growth. A key road block to this vision is the lack of transparency surrounding the quality of data. A process mining study that utilises low-quality, unrepresentative data as input has little or no value for the organisation and becomes a catalyst for erroneous conclusions (‘Garbage-in-Garbage-out’). Many process mining techniques do not take into account inherent inaccuracies in the data, or how the data might have been manipulated or pre-processed. It is thus impossible to ascertain the degree to which analysis outcomes can be relied upon. This tutorial paper outlines foundational concepts of data quality with a special focus on typical data quality issues found in event data used for process mining analyses. Key challenges and possible approaches to tackle these data quality problems are elaborated on.

[1]  Martin Bichler,et al.  Responsible Data Science , 2017, Bus. Inf. Syst. Eng..

[2]  R Rick Verhulst Evaluating quality of event data within event logs:an extensible framework , 2016 .

[3]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[4]  Marta Indulska,et al.  The Curse of Dimensionality in Data Quality , 2013 .

[5]  Linda C. Smith,et al.  A framework for information quality assessment , 2007 .

[6]  van der Wmp Wil Aalst,et al.  Wanna improve process mining results? : it’s high time we consider data quality issues seriously , 2013 .

[7]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[8]  Divesh Srivastava,et al.  Ensuring High-Quality Private Data for Responsible Data Science: Vision and Challenges , 2019, JDIQ.

[9]  Wil M. P. van der Aalst,et al.  Wanna improve process mining results? , 2013, 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[10]  Moe Thandar Wynn,et al.  Detection and Interactive Repair of Event Ordering Imperfection in Process Logs , 2018, CAiSE.

[11]  Dirk Fahland,et al.  A Conceptual Framework for Understanding Event Data Quality for Behavior Analysis , 2017, ZEUS.

[12]  Ricardo Seguel,et al.  Process Mining Manifesto , 2011, Business Process Management Workshops.

[13]  Richard Y. Wang,et al.  A product perspective on total data quality management , 1998, CACM.

[14]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[15]  Diane M. Strong,et al.  AIMQ: a methodology for information quality assessment , 2002, Inf. Manag..

[16]  Moe Thandar Wynn,et al.  Semi-supervised Log Pattern Detection and Exploration Using Event Concurrence and Contextual Information , 2017, OTM Conferences.

[17]  Moe Thandar Wynn,et al.  Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs , 2017, Inf. Syst..

[18]  Shazia Wasim Sadiq,et al.  A framework for data quality aware query systems , 2011, Inf. Syst..