An efficient model for information gain of sequential pattern from web logs based on dynamic weight constraint

Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Many frequent sequential traversal pattern mining algorithms have been developed which mine the set of frequent subsequences traversal pattern satisfying a minimum support constraint in a session database. However, previous frequent sequential traversal pattern mining algorithms give equal weightage to sequential traversal patterns while the pages in sequential traversal patterns have different importance and have different weightage. Another main problem in most of the frequent sequential traversal pattern mining algorithms is that they produce a large number of sequential traversal patterns when a minimum support is lowered and they do not provide alternative ways to adjust the number of sequential traversal patterns other than increasing the minimum support. In this paper, we propose a frequent sequential traversal pattern mining with weights constraint. Our main approach is to add the weight constraints into the sequential traversal pattern while maintaining the downward closure property. A weight range is defined to maintain the downward closure property and pages are given different weights and traversal sequences assign a minimum and maximum weight. In scanning a session database, a maximum and minimum weight in the session database is used to prune infrequent sequential traversal subsequence by doing downward closure property can be maintained. Our method produces a few but important sequential traversal patterns in session databases with a low minimum support, by adjusting a weight range of pages and sequence.

[1]  Jiong Yang,et al.  Mining Sequential Patterns from Large Data Sets , 2005, Advances in Database Systems.

[2]  Unil Yun,et al.  WSpan: Weighted Sequential pattern mining in large sequence databases , 2006, 2006 3rd International IEEE Conference Intelligent Systems.

[3]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[4]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[5]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[6]  Jian Pei,et al.  Sequence Data Mining , 2007, Advances in Database Systems.

[7]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[8]  John J. Leggett,et al.  WLPMiner: Weighted Frequent Pattern Mining with Length-Decreasing Support Constraints , 2005, PAKDD.

[9]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[10]  Philip S. Yu,et al.  Mining long sequential patterns in a noisy environment , 2002, SIGMOD '02.

[11]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[12]  John J. Leggett,et al.  WFIM: Weighted Frequent Itemset Mining with a weight range and a minimum weight , 2005, SDM.

[13]  Unil Yun,et al.  Mining lossless closed frequent patterns with weight constraints , 2007, Knowl. Based Syst..

[14]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[16]  Sunita Sarawagi,et al.  Sequence Data Mining , 2005 .

[17]  George Karypis,et al.  SLPMiner: an algorithm for finding frequent sequential patterns using length-decreasing support constraint , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[18]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[19]  Sourav S. Bhowmick,et al.  Sequential Pattern Mining: A Survey , 2003 .

[20]  Dong Hwa Kim,et al.  2010 International Conference on Computer Information Systems and Industrial Management Applications, CISIM, Krakow, Poland, October 8-10, 2010 , 2010, CISIM.

[21]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[22]  Philip S. Yu,et al.  Mining Asynchronous Periodic Patterns in Time Series Data , 2003, IEEE Trans. Knowl. Data Eng..