Likelihood-based Multiple Imputation by Event Chain Methodology for Repair of Imperfect Event Logs with Missing Data

The event log recorded through an information system may be missing for various reasons, which fact may result in an imperfect event log. Performing analyses using such an imperfect event log can seriously affect the quality of the obtained results. Therefore, analyses should be performed only after processing of the missing part in the imperfect event log. In the fields of data mining and statistical analysis, various methodologies have been developed to handle data with missing values, but there are not many studies dealing with incomplete event logs that have missing data in the field of process mining. In this paper, we propose a likelihood-based Multiple Imputation by Event Chain (MIEC) method for dealing with imperfect event logs with missing data. An experiment was performed using sample event logs, and a case study was conducted using a real steel manufacturing event log to verify our method. We expect the proposed method to repair the imperfect event log to a high level and to obtain analysis result with high quality even if there are many missing data.

[1]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[2]  Ken P Kleinman,et al.  Much Ado About Nothing , 2007, The American statistician.

[3]  Moe Thandar Wynn,et al.  Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs , 2017, Inf. Syst..

[4]  Brendan Kitts,et al.  Cross-sell: a fast promotion-tunable customer-item recommendation method based on conditionally independent probabilities , 2000, KDD '00.

[5]  M. Weske,et al.  Repairing Event Logs Using Stochastic Process Models , 2013 .

[6]  Graham K. Rand,et al.  Quantitative Applications in the Social Sciences , 1983 .

[7]  Noel A. Card,et al.  Best practices for missing data management in counseling psychology. , 2010, Journal of counseling psychology.

[8]  R. Perera Research methods journal club: a gentle introduction to imputation of missing values , 2008, Evidence-based medicine.

[9]  P. Allison Missing data techniques for structural equation modeling. , 2003, Journal of abnormal psychology.

[10]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[11]  K. Land,et al.  An Empirical Evaluation of the Predictive Mean Matching Method for Imputing Missing Values , 1997 .

[12]  Donald B. Rubin,et al.  A Non‐Iterative Algorithm for Least Squares Estimation of Missing Values in Any Analysis of Variance Design , 1972 .

[13]  Yangyong Zhu,et al.  The Challenges of Data Quality and Data Quality Assessment in the Big Data Era , 2015, Data Sci. J..

[14]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[15]  Leonardo Franco,et al.  Missing data imputation using statistical and machine learning methods in a real breast cancer problem , 2010, Artif. Intell. Medicine.

[16]  Hude Quan,et al.  Bmc Medical Research Methodology Open Access Dealing with Missing Data in a Multi-question Depression Scale: a Comparison of Imputation Methods , 2022 .

[17]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[18]  Wil M. P. van der Aalst,et al.  Process Discovery: An Introduction , 2011 .

[19]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[20]  Wil M. P. van der Aalst,et al.  Towards comprehensive support for organizational mining , 2008, Decis. Support Syst..

[21]  Wil M. P. van der Aalst,et al.  Process Mining , 2016, Springer Berlin Heidelberg.

[22]  van der Wmp Wil Aalst,et al.  Wanna improve process mining results? : it’s high time we consider data quality issues seriously , 2013 .

[23]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[24]  Boudewijn F. van Dongen,et al.  Business process mining: An industrial application , 2007, Inf. Syst..

[25]  Giancarlo Fortino,et al.  Internet of Things Based on Smart Objects, Technology, Middleware and Applications , 2014, Internet of Things Based on Smart Objects, Technology, Middleware and Applications.

[26]  Mickael Guedj,et al.  A Comparison of Six Methods for Missing Data Imputation , 2015 .

[27]  Fabrizio Maria Maggi,et al.  Intra and Inter-case Features in Predictive Process Monitoring: A Tale of Two Dimensions , 2017, BPM.