Pattern Mining for Predicting Critical Events from Sequential Event Data Log

Abstract This paper studies the mining of patterns for predicting critical events from observed ordered event data, where the observations can contain interleaving from non-predictor and other predictor event sequences. These are characteristics of many practical applications such as monitoring in power systems or telecommunication networks, as well as computational biology. For settings where system behaviors are affected by noise, a critical event can sometimes occur without its predictor executed prior to it, and we propose algorithm to recursively compute the frequency that a predictor candidate precedes the critical event. This we use for identifying a predictor, and study the performance of such a scheme. We also consider the noise-free settings, in which a critical event occurs only after the execution of its predictor, and propose an algorithm to recursively compute the set of maximal predictors for each critical event.

[1]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[2]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[3]  George Cybenko,et al.  Learning Hidden Markov Models Using Nonnegative Matrix Factorization , 2008, IEEE Transactions on Information Theory.

[4]  Shengbing Jiang,et al.  Failure diagnosis of discrete-event systems with linear-time temporal logic specifications , 2004, IEEE Transactions on Automatic Control.

[5]  Stéphane Lafortune,et al.  Predictability of event occurrences in partially-observed discrete-event systems , 2009, Autom..

[6]  Alexander L. Wolf,et al.  Discovering models of software processes from event-based data , 1998, TSEM.

[7]  I. Hacking Nineteenth Century Cracks in the Concept of Determinism , 1983 .

[8]  Boudewijn F. van Dongen,et al.  Workflow mining: A survey of issues and approaches , 2003, Data Knowl. Eng..

[9]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[10]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[11]  Trevor I. Dix,et al.  A Bit-String Longest-Common-Subsequence Algorithm , 1986, Inf. Process. Lett..

[12]  Jiawei Han,et al.  SeqIndex: Indexing Sequences by Sequential Pattern Analysis , 2005, SDM.

[13]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[14]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[15]  Prakash Narayan,et al.  Order estimation and sequential universal data compression of a hidden Markov source by the method of mixtures , 1994, IEEE Trans. Inf. Theory.

[16]  MengChu Zhou,et al.  Model Identification and Synthesis of Discrete-Event Systems , 2015 .

[17]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[18]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[19]  Xi Wang,et al.  DISCOVERY OF INTERMINGLED EVENT PATTERNS IN DISCRETE MONITORING DATA , 2007 .

[20]  Padhraic Smyth,et al.  Pattern discovery in sequences under a Markov assumption , 2002, KDD.

[21]  Jun Chen,et al.  Polynomial Test for Stochastic Diagnosability of Discrete-Event Systems , 2013, IEEE Trans Autom. Sci. Eng..

[22]  Maria Pia Fanti,et al.  Model Identification and Synthesis of Discrete-Event Systems , 2011 .

[23]  Shengbing Jiang,et al.  Diagnosis of repeated failures for discrete event systems with linear-time temporal logic specifications , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[24]  Jun S. Liu,et al.  Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies , 1995 .

[25]  Jun Chen,et al.  Online failure diagnosis of stochastic discrete event systems , 2013, 2013 IEEE Conference on Computer Aided Control System Design (CACSD).

[26]  Jun Chen,et al.  Failure diagnosis of discrete-time stochastic systems subject to temporal logic correctness requirements , 2014, Proceedings of the 11th IEEE International Conference on Networking, Sensing and Control.

[27]  Neri Merhav,et al.  On the estimation of the order of a Markov chain and universal data compression , 1989, IEEE Trans. Inf. Theory.

[28]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[29]  Hongyan Xing,et al.  Decentralized Failure Diagnosis of Stochastic Discrete Event Systems , 2006, ArXiv.

[30]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[31]  Jun Chen,et al.  Failure prognosability of stochastic discrete event systems , 2014, 2014 American Control Conference.

[32]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[33]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..