Sequential pattern mining is very important because it is the basis of many applications. Although there has been a great deal of effort on sequential pattern mining in recent years, its performance is still far from satisfactory because of two main challenges: large search spaces and the ineffectiveness in handling dense data sets. To offer a solution to the above challenges, we have proposed a series of novel algorithms, called the LAst Position INduction (LAPIN) sequential pattern mining, which is based on the simple idea that the last position of an item, α , is the key to judging whether or not a frequent k-length sequential pattern can be extended to be a frequent (k+1)-length pattern by appending the item α to it. LAPIN can largely reduce the search space during the mining process, and is very effective in mining dense data sets. Our experimental data and performance studies show that LAPIN outperforms PrefixSpan by up to an order of magnitude on long pattern dense data sets.
[1]
Johannes Gehrke,et al.
Sequential PAttern mining using a bitmap representation
,
2002,
KDD.
[2]
Qiming Chen,et al.
PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth
,
2001,
Proceedings 17th International Conference on Data Engineering.
[3]
Ramakrishnan Srikant,et al.
Mining sequential patterns
,
1995,
Proceedings of the Eleventh International Conference on Data Engineering.
[4]
Ramakrishnan Srikant,et al.
Mining Sequential Patterns: Generalizations and Performance Improvements
,
1996,
EDBT.
[5]
Zhenglu Yang,et al.
LAPIN: Effective Sequential Pattern Mining Algorithms by Last Position Induction for Dense Databases
,
2007,
DASFAA.
[6]
Mohammed J. Zaki,et al.
SPADE: An Efficient Algorithm for Mining Frequent Sequences
,
2004,
Machine Learning.