Much research has been done on discovering interesting and frequent user access patterns from web logs. Recently, a novel data structure, known as Web Access Pattern Tree (or WAP-tree), was developed. The associated WAP-mine algorithm is obviously faster than traditional sequential pattern mining techniques. However, WAP-mine requires re-constructing large numbers of intermediate conditional WAP-trees during mining, which is also very costly. In this paper, we propose an efficient WAP-tree mining algorithm, known as CS-mine (Conditional Sequence mining algorithm), which is based directly on the initial conditional sequence base of each frequent event and eliminates the need for re-constructing intermediate conditional WAP-trees. This can improve significantly on efficiency comparing with WAP-mine, especially when the support threshold becomes smaller and the size of database gets larger.
[1]
Ramakrishnan Srikant,et al.
Mining sequential patterns
,
1995,
Proceedings of the Eleventh International Conference on Data Engineering.
[2]
Ramakrishnan Srikant,et al.
Mining Sequential Patterns: Generalizations and Performance Improvements
,
1996,
EDBT.
[3]
Jian Pei,et al.
Mining Access Patterns Efficiently from Web Logs
,
2000,
PAKDD.
[4]
Hendrik Blockeel,et al.
Web mining research: a survey
,
2000,
SKDD.
[5]
Carolina Ruiz,et al.
FS-Miner: An Efficient and Incremental System to Mine Contiguous Frequent Sequences
,
2003
.
[6]
Yi Lu,et al.
Position Coded Pre-order Linked WAP-Tree for Web Log Sequential Pattern Mining
,
2003,
PAKDD.