Analysis of Sequential Pattern Mining Algorithms

Sequential pattern mining is an important data mining problem with broad applications. Most of the previously developed sequential pattern mining methods, such as SPAM and SPADE, explore a candidate generation-and-test approach (12) which reduces the number of candidates to be examined. In this paper, we have implemented SPADE, SPAM and Prefixspan algorithm on the two databases. One database is sign database which is taken from ASL (American sign language database) (11). The second dataset is Kosarak dataset containing 10000 sequences of click-stream data from an hungarian news portal. Sign dataset forms the dense dataset with few distinct items and Kosarak forms the sparse dataset with maximum distinct items. From the experimental results, SPADE performs better in both the dense as well as sparse dataset taken for simulation study. Performance of SPAM is worst when executed on sparse dataset. The number of sequences generated is same in both the dataset by all the mentioned algorithms. For dense dataset prefixsapn uses less memory whereas in sparse dataset it utilizes the most. In Dense dataset SPAM and SPADE are utilizing approximately constant memory. In sparse dataset minimum utilization of memory is by SPADE.

[1]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[2]  Jian Pei,et al.  Constraint-based sequential pattern mining: the pattern-growth methods , 2007, Journal of Intelligent Information Systems.

[3]  Yue-Shi Lee,et al.  Mining Sequential Patterns with Item Constraints , 2004, DaWaK.

[4]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[5]  Jean-François Boulicaut If Constraint-Based Mining is the Answer: What is the Constraint? (Invited Talk) , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[6]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[8]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[9]  Alpa Reshamwala,et al.  Prediction of Yahoo! Music Sequences on User’s Musical Taste , 2012, IAIT 2012.

[10]  Suh-Yin Lee,et al.  Incremental update on sequential patterns in large databases by implicit merging and efficient counting , 2004, Inf. Syst..

[11]  A. Reshamwala,et al.  Prediction of DoS attack sequences , 2012, 2012 International Conference on Communication, Information & Computing Technology (ICCICT).

[12]  Tzung-Pei Hong,et al.  Maintenance of sequential patterns for record deletion , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[13]  Dong Liu,et al.  Distributed PrefixSpan algorithm based on MapReduce , 2012, 2012 International Symposium on Information Technologies in Medicine and Education.

[14]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[15]  Cláudia Antunes,et al.  Generalization of Pattern-Growth Methods for Sequential Pattern Mining with Gap Constraints , 2003, MLDM.

[16]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[17]  Jia-Dong Ren,et al.  An algorithm for mining generalized sequential patterns , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[18]  Ryohei Orihara,et al.  Discovery of Sequential Patterns Based On Constraint Patterns , 2008 .