On Incremental High Utility Sequential Pattern Mining

High utility sequential pattern (HUSP) mining is an emerging topic in pattern mining, and only a few algorithms have been proposed to address it. In practice, most sequence databases usually grow over time, and it is inefficient for existing algorithms to mine HUSPs from scratch when databases grow with a small portion of updates. In view of this, we propose the IncUSP-Miner+ algorithm to mine HUSPs incrementally. Specifically, to avoid redundant re-computations, we propose a tighter upper bound of the utility of a sequence, called Tight Sequence Utility (TSU), and then we design a novel data structure, called the candidate pattern tree, to buffer the sequences whose TSU values are greater than or equal to the minimum utility threshold in the original database. Accordingly, to avoid keeping a huge amount of utility information for each sequence, a set of concise utility information is designed to be stored in each tree node. To improve the mining efficiency, several strategies are proposed to reduce the amount of computation for utility update and the scopes of database scans. Moreover, several strategies are also proposed to properly adjust the candidate pattern tree for the support of multiple database updates. Experimental results on some real and synthetic datasets show that IncUSP-Miner+ is able to efficiently mine HUSPs incrementally.

[1]  Byeong-Soo Jeong,et al.  A Novel Approach for Mining High‐Utility Sequential Patterns in Sequence Databases , 2010 .

[2]  Tzung-Pei Hong,et al.  Applying the maximum utility measure in high utility sequential pattern mining , 2014, Expert Syst. Appl..

[3]  Jiun-Long Huang,et al.  Incremental Mining of High Utility Sequential Patterns in Incremental Databases , 2016, CIKM.

[4]  Srinivasan Parthasarathy,et al.  Incremental and interactive sequence mining , 1999, CIKM '99.

[5]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[6]  Longbing Cao,et al.  Efficiently Mining Top-K High Utility Sequential Patterns , 2013, 2013 IEEE 13th International Conference on Data Mining.

[7]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[8]  Maria E. Orlowska,et al.  Improvements of IncSpan: Incremental Mining of Sequential Patterns in Large Database , 2005, PAKDD.

[9]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[10]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[11]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[12]  Yi-Cheng Chen,et al.  On efficiently mining high utility sequential patterns , 2016, Knowledge and Information Systems.

[13]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Philip S. Yu,et al.  Mining High Utility Mobile Sequential Patterns in Mobile Commerce Environments , 2011, DASFAA.

[15]  Longbing Cao,et al.  USpan: an efficient algorithm for mining high utility sequential patterns , 2012, KDD.

[16]  Kyuseok Shim,et al.  SQUIRE: sequential pattern mining with quantities , 2004, Proceedings. 20th International Conference on Data Engineering.

[17]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[18]  Yue Chen,et al.  Incremental Mining of Sequential Patterns Using Prefix Tree , 2007, PAKDD.

[19]  Byeong-Soo Jeong,et al.  A Framework for Mining High Utility Web Access Sequences , 2011 .

[20]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.