PTree: Mining Sequential Patterns Efficiently in Multiple Data Streams Environment

Although issues of data streams have been widely studied and utilized, it is nevertheless challenging to deal with sequential mining of data streams. In this paper, we assume that the transaction of a user is partially coming and that there is no auxiliary for buffering and integrating. We adopt the Path Tree for mining frequent sequential patterns over data streams and integrate the user's sequences efficiently. Algorithms with regards to accuracy (PAlgorithm) and space (PSAlgorithm) are proposed to meet the different aspects of users, although GAlgorithm for mining frequent sequential patterns with a gap limitation is proposed. Many pruning properties are used to further reduce the space usage and improve the accuracy of our algorithms. We also prove that PAlgorithm mine frequent sequential patterns with the approximate support of error guarantee. Through thoughtful experiments, synthetic and real datasets are utilized to verify the feasibility of our algorithms.

[1]  Toon Calders,et al.  Mining Compressing Sequential Patterns , 2012, Stat. Anal. Data Min..

[2]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[3]  Xindong Wu,et al.  Sequential pattern mining in multiple streams , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[4]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[5]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[6]  Lei Chang,et al.  SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  Yi-Chun Chen,et al.  Mining Sequential Association Rules Efficiently by Using Prefix Projected Databases , 2011 .

[8]  Bi-Ru Dai,et al.  Mining Top-K Sequential Patterns in the Data Stream Environment , 2010, 2010 International Conference on Technologies and Applications of Artificial Intelligence.

[9]  Carolina Ruiz,et al.  FS-Miner: efficient and incremental mining of frequent sequence patterns in web logs , 2004, WIDM '04.

[10]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[11]  Yen-Liang Chen,et al.  Mining sequential patterns in the B2B environment , 2009, J. Inf. Sci..

[12]  Yen-Liang Chen,et al.  Mining sequential patterns from multidimensional sequence data , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  Jian Pei,et al.  ApproxMAP: Approximate Mining of Consensus Sequential Patterns , 2003, SDM.

[14]  Wilfred Ng,et al.  Mining probabilistically frequent sequential patterns in uncertain databases , 2012, EDBT '12.

[15]  Shih-Yang Yang,et al.  Incremental Mining of Across-streams Sequential Patterns in Multiple Data Streams , 2011, J. Comput..

[16]  Suh-Yin Lee,et al.  On mining webclick streams for path traversal patterns , 2004, WWW Alt. '04.

[17]  Yi-Chun Chen,et al.  Path Tree: Mining Sequential Patterns Efficiently in Data Streams Environments , 2013 .

[18]  Suh-Yin Lee,et al.  DSM-TKP: mining top-k path traversal patterns over Web click-streams , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[19]  Shih-Yang Yang,et al.  Incremental Mining of Closed Sequential Patterns in Multiple Data Streams , 2011, J. Networks.