A scalable complex event analytical system with incremental episode mining over data streams

Episode pattern mining is a very powerful technique to get high-valued information for people to solve real-life cross-disciplinary problems, such as for the analysis of manufacturing, stock markets, weather records and so on. As data grows, the mining process must be re-triggered again and again to obtain the most updated information. However, periodically re-mining the full dataset is not cost-effective, and thus a number of incremental mining approaches arise for the growing data. However, to our best knowledge, there exist few studies targeted on the problem of incremental episode mining. Moreover, streaming data of complex events is more and more popular because digital sensors always collect data around us in this big data age. Now the challenge is not only mining valuable episode patterns of incremental dataset, but also mining episode patterns over data streams of complex events. To address this research problem, we adopt the Lambda Architecture to design a scalable complex event analytical system that could be used to facilitate the incremental episode mining process over complex event sequences of data streams. Apache Spark and Apache Spark Streaming are applied as the development framework of the batch layer and the speed layer, respectively. To take both the efficiency and accuracy into consideration, we develop a series of modules and three algorithms, namely, batch episode mining, delta episode mining and pattern merging. Results from the experimental validation on a real dataset show that the proposed system carries high scalability and delivers excellent performance in terms of efficiency and accuracy.

[1]  Ming-Yang Su,et al.  Discovery and prevention of attack episodes by frequent episodes mining and finite state machines , 2010, J. Netw. Comput. Appl..

[2]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[3]  Vincent S. Tseng,et al.  Efficient Mining of Frequent Target Episodes from Complex Event Sequences , 2014, ICS.

[4]  Heikki Mannila,et al.  Discovering Generalized Episodes Using Minimal Occurrences , 1996, KDD.

[5]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  David Wai-Lok Cheung,et al.  A General Incremental Technique for Maintaining Discovered Association Rules , 1997, DASFAA.

[7]  P. S. Grover,et al.  Incremental mining of sequential patterns: Progress and challenges , 2013, Intell. Data Anal..

[8]  Gemma Casas-Garriga Discovering Unbounded Episodes in Sequential Data , 2003 .

[9]  Philip S. Yu,et al.  Mining high utility episodes in complex event sequences , 2013, KDD.

[10]  Christie I. Ezeife,et al.  A Low-Scan Incremental Association Rule Maintenance Method Based on the Apriori Property , 2001, Canadian Conference on AI.

[11]  P. S. Sastry,et al.  A fast algorithm for finding frequent episodes in event streams , 2007, KDD '07.

[12]  Vincent S. Tseng,et al.  Effective temporal data classification by integrating sequential pattern mining and probabilistic induction , 2009, Expert Syst. Appl..

[13]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[14]  Chia-Hui Chang,et al.  Efficient mining of frequent episodes from complex sequences , 2008, Inf. Syst..

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[17]  Gao Feng An Incremental Updating Technique for Association Rules , 2000 .

[18]  Necip Fazil Ayan,et al.  An efficient algorithm to update large itemsets with early pruning , 1999, KDD '99.

[19]  Naren Ramakrishnan,et al.  Efficient Episode Mining of Dynamic Event Streams , 2012, 2012 IEEE 12th International Conference on Data Mining.

[20]  Ching-Yu Chen,et al.  A Novel Complex-Events Analytical System Using Episode Pattern Mining Techniques , 2015, IScIDE.

[21]  Vincent S. Tseng,et al.  A Novel Episode Mining Methodology for Stock Investment , 2014, J. Inf. Sci. Eng..

[22]  N. C. Chauhan,et al.  Incremental Mining of Association Rules : A Survey , 2012 .

[23]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[24]  Kian-Lee Tan,et al.  Finding constrained frequent episodes using minimal occurrences , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).