Discovering utility-based episode rules in complex event sequences

Abstract Mining high utility episode rules in complex event sequences has emerged as an important topic in data mining because the utility-based episode rules generated may provide important insights that facilitate decision making for expert and intelligent systems. Although one may employ previous methods in this research area to indirectly construct utility-based episode rules, they typically lack efficiency and effectiveness for real-world applications. In this paper, we develop a novel methodology to directly generate high utility episode rules during the mining process, which is the first work addressing the issue of utility-based episode rule mining. Our goal is to simultaneously resolve the difficulty of the previous reported methods for frequent episode mining and utility-based episode mining. An algorithm called UBER-Mine (Utility-Based Episode Rules) and a structure named UR-Tree (Utility Rule Tree) are proposed to mine efficiently the complete set of high utility episode rules in complex event sequences. In short, UBER-Mine is based on an extended downward closure property, which can efficiently discover utility-based episode rules. On the other hand, UR-Tree can maintain important event information without producing candidate episodes to further accelerate the mining process. Results on both real and synthetic datasets show that UBER-Mine with UR-Tree has good scalability on large datasets and runs faster than the basic UBER-Mine and the current best high utility episode mining algorithm over 100 times. Furthermore, by proposing a high-utility episode-rule model called IV-UBER (InVestment by Utility-Based Episode Rules), we further demonstrate the effectiveness of our method for mining high utility-based episode rules on a real-world application for stock investment. The experimental results show that our proposed IV-UBER method outperforms several state-of-the-art algorithms in terms of both precision and annualized return for investment.

[1]  Berk A. Sensoy Performance Evaluation and Self-Designated Benchmark Indexes in the Mutual Fund Industry , 2008 .

[2]  Ya Wang,et al.  Frequent episode mining within the latest time windows over event streams , 2013, Applied Intelligence.

[3]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[4]  Ada Wai-Chee Fu,et al.  Mining Frequent Episodes for Relating Financial Events and Stock Trends , 2003, PAKDD.

[5]  Boris Cule,et al.  Mining closed episodes with simultaneous events , 2011, KDD.

[6]  Bao Rong Chang,et al.  Feature Selection and Parameter Optimization of a Fuzzy-based Stock Selection Model Using Genetic Algorithms , 2012 .

[7]  Vincent S. Tseng,et al.  A Novel Episode Mining Methodology for Stock Investment , 2014, J. Inf. Sci. Eng..

[8]  Chien-Feng Huang,et al.  A hybrid stock selection model using genetic algorithms and support vector regression , 2012, Appl. Soft Comput..

[9]  Benjamin C. M. Fung,et al.  Direct Discovery of High Utility Itemsets without Candidate Generation , 2012, 2012 IEEE 12th International Conference on Data Mining.

[10]  Pedro Isasi Viñuela,et al.  Soft computing techniques applied to finance , 2008, Applied Intelligence.

[11]  Philip S. Yu,et al.  Efficient algorithms for mining maximal high utility itemsets from data streams with different models , 2012, Expert Syst. Appl..

[12]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[13]  Philip S. Yu,et al.  Efficient Mining of a Concise and Lossless Representation of High Utility Itemsets , 2011, 2011 IEEE 11th International Conference on Data Mining.

[14]  Albrecht Zimmermann,et al.  Understanding episode mining techniques: Benchmarking on diverse, realistic, artificial data , 2014, Intell. Data Anal..

[15]  Philip S. Yu,et al.  Mining top-K high utility itemsets , 2012, KDD.

[16]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[17]  Ravi Shukla,et al.  A performance evaluation of global equity mutual funds: Evidence from 1988–1995 , 1997 .

[18]  Fuzhen Zhuang,et al.  Online Frequent Episode Mining , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[19]  Enhong Chen,et al.  High Utility Episode Mining Made Practical and Fast , 2014, ADMA.

[20]  Philip S. Yu,et al.  Mining high utility episodes in complex event sequences , 2013, KDD.

[21]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[22]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[23]  Kian-Lee Tan,et al.  Finding constrained frequent episodes using minimal occurrences , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[24]  Ravi Sankar,et al.  Time Series Prediction Using Support Vector Machines: A Survey , 2009, IEEE Computational Intelligence Magazine.

[25]  Vojislav Kecman,et al.  Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models , 2001 .

[26]  B. LeBaron,et al.  Simple Technical Trading Rules and the Stochastic Properties of Stock Returns , 1992 .

[27]  Chia-Hui Chang,et al.  Efficient mining of frequent episodes from complex sequences , 2008, Inf. Syst..

[28]  Mikhail J. Atallah,et al.  Reliable detection of episodes in event sequences , 2004, Knowledge and Information Systems.

[29]  Byeong-Soo Jeong,et al.  A Novel Approach for Mining High‐Utility Sequential Patterns in Sequence Databases , 2010 .

[30]  Tzung-Pei Hong,et al.  Applying the maximum utility measure in high utility sequential pattern mining , 2014, Expert Syst. Appl..

[31]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.