Benchmarking the effectiveness of sequential pattern mining methods

Recently, there is an increasing interest in new intelligent mining methods to find more meaningful and compact results. In intelligent data mining research, accessing the quality and usefulness of the results from different mining methods is essential. However, there is no general benchmarking criteria to evaluate whether these new methods are indeed more effective compared to the traditional methods. Here we propose a novel benchmarking criteria that can systematically evaluate the effectiveness of any sequential pattern mining method under a variety of situations. The benchmark evaluates how well a mining method finds known common patterns in synthetic data. Such an evaluation provides a comprehensive understanding of the resulting patterns generated from any mining method empirically. In this paper, the criteria are applied to conduct a detailed comparison study of the support-based sequential pattern model with an approximate pattern model based on sequence alignment. The study suggests that the alignment model will give a good summary of the sequential data in the form of a set of common patterns in the data. In contrast, the support model generates massive amounts of frequent patterns with much redundancy. This suggests that the results of the support model require more post processing before it can be of actual use in real applications.

[1]  Wei Wang,et al.  Comparative Study of Sequential Pattern Mining Models , 2005, Foundations of Data Mining and knowledge Discovery.

[2]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[3]  Philip S. Yu,et al.  Mining long sequential patterns in a noisy environment , 2002, SIGMOD '02.

[4]  Valerie Guralnik,et al.  A scalable algorithm for clustering sequential data , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[5]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[6]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[7]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[8]  Jian Pei,et al.  ApproxMAP: Approximate Mining of Consensus Sequential Patterns , 2003, SDM.

[9]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[10]  Mohammed J. Zaki Efficient enumeration of frequent sequences , 1998, CIKM '98.

[11]  Myra Spiliopoulou,et al.  Managing Interesting Rules in Sequence Mining , 1999, PKDD.

[12]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[14]  Carla E. Brodley,et al.  KDD-Cup 2000 organizers' report: peeling the onion , 2000, SKDD.

[15]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.