Large-Scale Frequent Episode Mining from Complex Event Sequences with Hierarchies

Frequent Episode Mining (FEM), which aims at mining frequent sub-sequences from a single long event sequence, is one of the essential building blocks for the sequence mining research field. Existing studies about FEM suffer from unsatisfied scalability when faced with complex sequences as it is an NP-complete problem for testing whether an episode occurs in a sequence. In this article, we propose a scalable, distributed framework to support FEM on “big” event sequences. As a rule of thumb, “big” illustrates an event sequence is either very long or with masses of simultaneous events. Meanwhile, the events in this article are arranged in a predefined hierarchy. It derives some abstractive events that can form episodes that may not directly appear in the input sequence. Specifically, we devise an event-centered and hierarchy-aware partitioning strategy to allocate events from different levels of the hierarchy into local processes. We then present an efficient special-purpose algorithm to improve the local mining performance. We also extend our framework to support maximal and closed episode mining in the context of event hierarchy, and to the best of our knowledge, we are the first attempt to define and discover hierarchy-aware maximal and closed episodes. We implement the proposed framework on Apache Spark and conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the efficiency and scalability of the proposed approach and show that we can find practical patterns when taking event hierarchies into account.

[1]  Sushil Jajodia,et al.  Applications of Data Mining in Computer Security , 2002, Advances in Information Security.

[2]  Gemma C. Garriga Discovering Unbounded Episodes in Sequential Data , 2003, PKDD.

[3]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[4]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[5]  Naren Ramakrishnan,et al.  Experiences with mining temporal event sequences from electronic medical records: initial successes and some challenges , 2011, KDD.

[6]  Panos Kalnis,et al.  Parallel motif extraction from very long sequences , 2013, CIKM.

[7]  Lidan Shou,et al.  Splitter: Mining Fine-Grained Sequential Patterns in Semantic Trajectories , 2014, Proc. VLDB Endow..

[8]  Fuzhen Zhuang,et al.  Discovering and learning sensational episodes of news events , 2014, WWW.

[9]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[10]  Philip S. Yu,et al.  Mining high utility episodes in complex event sequences , 2013, KDD.

[11]  Nikolaj Tatti,et al.  Ranking episodes using a partition model , 2015, Data Mining and Knowledge Discovery.

[12]  P. S. Sastry,et al.  A fast algorithm for finding frequent episodes in event streams , 2007, KDD '07.

[13]  Ada Wai-Chee Fu,et al.  Mining Frequent Episodes for Relating Financial Events and Stock Trends , 2003, PAKDD.

[14]  Christopher D. Carothers,et al.  VOGUE: A Novel Variable Order-Gap State Machine for Modeling Sequences , 2006, PKDD.

[15]  Nikolaj Tatti,et al.  Discovering episodes with compact minimal windows , 2014, Data Mining and Knowledge Discovery.

[16]  Mikhail J. Atallah,et al.  Detection of significant sets of episodes in event sequences , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[17]  Kian-Lee Tan,et al.  Finding constrained frequent episodes using minimal occurrences , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[18]  David A. Padua,et al.  A sampling-based framework for parallel data mining , 2005, PPoPP.

[19]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[20]  Rainer Gemulla,et al.  LASH: Large-Scale Sequence Mining with Hierarchies , 2015, SIGMOD Conference.

[21]  Wang Hongzhi,et al.  Data mining for intrusion detection , 2001, 2001 International Conferences on Info-Tech and Info-Net. Proceedings (Cat. No.01EX479).

[22]  Vincent S. Tseng,et al.  Discovering utility-based episode rules in complex event sequences , 2015, Expert Syst. Appl..

[23]  Slav Petrov,et al.  Syntactic Annotations for the Google Books NGram Corpus , 2012, ACL.

[24]  Enhong Chen,et al.  Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion , 2011, TIST.

[25]  Qing He,et al.  Beyond Polarity: Interpretable Financial Sentiment Analysis with Hierarchical Query-driven Attention , 2018, IJCAI.

[26]  Marie-France Sagot,et al.  A parallel algorithm for the extraction of structured motifs , 2004, SAC '04.

[27]  Jilles Vreeken,et al.  The long and the short of it: summarising event sequences with serial episodes , 2012, KDD.

[28]  David A. Padua,et al.  Parallel mining of closed sequential patterns , 2005, KDD '05.

[29]  Avinash Achar,et al.  A unified view of the apriori-based algorithms for frequent episode discovery , 2011, Knowledge and Information Systems.

[30]  Fuzhen Zhuang,et al.  Online Frequent Episode Mining , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[31]  N. H. Hakansson.,et al.  Industry rotation in the U.S. stock market: 1934-1986 returns on passive, semi-passive, and active strategies , 1990 .

[32]  Gerhard Weikum,et al.  Scalable knowledge harvesting with high precision and high recall , 2011, WSDM '11.

[33]  Shibamouli Lahiri,et al.  Complexity of Word Collocation Networks: A Preliminary Structural Analysis , 2013, EACL.

[34]  Lina Fahed,et al.  DEER: Distant and Essential Episode Rules for early prediction , 2018, Expert Syst. Appl..

[35]  Heikki Mannila,et al.  Discovering Generalized Episodes Using Minimal Occurrences , 1996, KDD.

[36]  Mohammed J. Zaki Parallel Sequence Mining on Shared-Memory Machines , 1999, J. Parallel Distributed Comput..

[37]  Boris Cule,et al.  Mining closed episodes with simultaneous events , 2011, KDD.

[38]  Klaus Berberich,et al.  Mind the gap: large-scale frequent sequence mining , 2013, SIGMOD '13.

[39]  Raajay Viswanathan,et al.  Discovering injective episodes with general partial orders , 2011, Data Mining and Knowledge Discovery.

[40]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[41]  Avinash Achar,et al.  Pattern-growth based frequent serial episode discovery , 2013, Data Knowl. Eng..

[42]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[43]  Chia-Hui Chang,et al.  Efficient mining of frequent episodes from complex sequences , 2008, Inf. Syst..

[44]  Sangkyum Kim,et al.  Mining Flipping Correlations from Large Datasets with Taxonomies , 2011, Proc. VLDB Endow..

[45]  Yang Liu,et al.  Free-Rider Episode Screening via Dual Partition Model , 2018, DASFAA.

[46]  Jin Wang,et al.  Ranking support for matched patterns over complex event streams: The CEPR system , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[47]  Jin Wang,et al.  A Transformation-Based Framework for KNN Set Similarity Search , 2020, IEEE Transactions on Knowledge and Data Engineering.