Efficient Recovery of Missing Events

For various entering and transmission issues raised by human or system, missing events often occur in event data, which record execution logs of business processes. Without recovering the missing events, applications such as provenance analysis or complex event processing built upon event data are not reliable. Following the minimum change discipline in improving data quality, it is also rational to find a recovery that minimally differs from the original data. Existing recovery approaches fall short of efficiency owing to enumerating and searching over all of the possible sequences of events. In this paper, we study the efficient techniques for recovering missing events. According to our theoretical results, the recovery problem appears to be NP-hard. Nevertheless, advanced indexing, pruning techniques are developed to further improve the recovery efficiency. The experimental results demonstrate that our minimum recovery approach achieves high accuracy, and significantly outperforms the state-of-the-art technique for up to five orders of magnitudes improvement in time performance.

[1]  Joost Engelfriet,et al.  Branching processes of Petri nets , 1991, Acta Informatica.

[2]  Sudha Ram,et al.  SEAM: A State-Entity-Activity-Model for a Well-Defined Workflow Development Methodology , 2002, IEEE Trans. Knowl. Data Eng..

[3]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[4]  Tao Jin,et al.  Querying business process model repositories , 2014, World Wide Web.

[5]  Matej Škrobot Petri nets in biology , 2014 .

[6]  Jianmin Wang,et al.  Enriching Data Imputation with Extensive Similarity Neighbors , 2015, Proc. VLDB Endow..

[7]  Monika Heiner,et al.  Snoopy - a unifying Petri net framework to investigate biomolecular networks , 2010, Bioinform..

[8]  Tao Li,et al.  Natural event summarization , 2011, CIKM '11.

[9]  Remco M. Dijkman,et al.  Petri Net Transformations for Business Processes - A Survey , 2009, Trans. Petri Nets Other Model. Concurr..

[10]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[11]  Monika Heiner,et al.  STEPP - Search Tool for Exploration of Petri net Paths: A new tool for Petri net-based path analysis in biochemical networks , 2004, Silico Biol..

[12]  Thomas Heinis,et al.  Efficient lineage tracking for scientific workflows , 2008, SIGMOD Conference.

[13]  Wil M. P. van der Aalst,et al.  Workflow Verification: Finding Control-Flow Errors Using Petri-Net-Based Techniques , 2000, Business Process Management.

[14]  Osamu Watanabe,et al.  Evaluations of Hash Distributed A* in Optimal Sequence Alignment , 2011, IJCAI.

[15]  Jun'ichi Tatemura,et al.  Runtime Semantic Query Optimization for Event Stream Processing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[17]  Dan Olteanu,et al.  From complete to incomplete information and back , 2007, SIGMOD '07.

[18]  Kees M. van Hee,et al.  Workflow Management: Models, Methods, and Systems , 2002, Cooperative information systems.

[19]  Dan Olteanu,et al.  10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information , 2007, ICDE.

[20]  Wil M. P. van der Aalst,et al.  Aligning Event Logs and Declarative Process Models for Conformance Checking , 2012, BPM.

[21]  Monika Heiner,et al.  Application of Petri net theory for modelling and validation of the sucrose breakdown pathway in the potato tuber , 2005, Bioinform..

[22]  Tao Jin,et al.  Efficiently Querying Business Process Models with BeehiveZ , 2011, BPM.

[23]  W. Marsden I and J , 2012 .

[24]  Susan B. Davidson,et al.  Detecting and resolving unsound workflow views for correct provenance analysis , 2009, SIGMOD Conference.

[25]  Philip S. Yu,et al.  Matching heterogeneous events with patterns , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[26]  Lei Zou,et al.  Matching Heterogeneous Event Data , 2018, IEEE Transactions on Knowledge and Data Engineering.

[27]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[28]  M. Heiner,et al.  Petri nets in biology, chemistry, and medicine : bibliography , 2002 .

[29]  Hong Cheng,et al.  Repairing Vertex Labels under Neighborhood Constraints , 2014, Proc. VLDB Endow..

[30]  Sanjeev Khanna,et al.  Differencing Provenance in Scientific Workflows , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[31]  Jacek Sroka,et al.  NRC as a formal model for expressing bioinformatics workflows , 2005 .

[32]  Fabio Casati,et al.  Event correlation for process discovery from web service interaction logs , 2011, The VLDB Journal.

[33]  Bokyoung Kang,et al.  Periodic Performance Prediction for Real-time Business Process Monitoring , 2012, Ind. Manag. Data Syst..

[34]  Jacques Wainer,et al.  Algorithms for anomaly detection of traces in logs of process aware information systems , 2013, Inf. Syst..

[35]  Dirk Fahland,et al.  Instantaneous Soundness Checking of Industrial Business Process Models , 2009, BPM.

[36]  Tomasz Imielinski,et al.  Incomplete object—a data model for design and planning applications , 1991, SIGMOD '91.

[37]  Jan Mendling,et al.  Seven process modeling guidelines (7PMG) , 2010, Inf. Softw. Technol..

[38]  Wil M. P. van der Aalst,et al.  Conformance checking of service behavior , 2008, TOIT.

[39]  Jianmin Wang,et al.  Efficient Recovery of Missing Events , 2016, IEEE Trans. Knowl. Data Eng..

[40]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2003, Distributed and Parallel Databases.

[41]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[42]  Adriane Chapman,et al.  Why Not? , 1965, SIGMOD Conference.

[43]  Evimaria Terzi,et al.  Constructing comprehensive summaries of large event sequences , 2009, TKDD.

[44]  Laks V. S. Lakshmanan,et al.  On approximating optimum repairs for functional dependency violations , 2009, ICDT '09.

[45]  Sanjeev Khanna,et al.  An optimal labeling scheme for workflow provenance using skeleton labels , 2010, SIGMOD Conference.

[46]  Dan Olteanu,et al.  $${10^{(10^{6})}}$$ worlds and beyond: efficient representation and processing of incomplete information , 2006, 2007 IEEE 23rd International Conference on Data Engineering.

[47]  Kenneth L. McMillan,et al.  A technique of state space search based on unfolding , 1995, Formal Methods Syst. Des..