FMGSP: An Efficient Method of Mining Global Sequential Patterns

Now some distributed sequential patterns mining algorithms generate too many candidate sequences, and increase communication overhead. Therefore, we propose an efficient algorithm-FMGSP (fast mining of global sequential patterns) of mining global sequential pattern on distributed system. Our method of mining sequential pattern in distributed environment differs from previous related works. Two main contributions are made in this paper. First local sequential patterns obtained on every site in distributed environment are compressed into a lexicographic sequence tree before all subtrees will be distributed into polling site, Second, an efficient pruning strategy called I/S-EP (item and sequence extension pruning) is proposed to reduce candidate sequences. Just this, the cost of communication in the network is reduced greatly when counting requests are sent (or received) to the corresponding databases. Both theories and experiments indicate that the performance of FMGSP is predominant for large databases, the global sequential patterns could be obtained effectively by the method after reducing the cost of communication.

[1]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[3]  Zou Xiang Study on Distributed Sequential Pattern Discovery Algorithm , 2005 .

[4]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[5]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[6]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[7]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[8]  Yang Ming,et al.  Fast Mining of Global Maximum Frequent Itemsets , 2005 .

[9]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.