Mining weighted sequential patterns in a sequence database with a time-interval weight

Sequential pattern mining, including weighted sequential pattern mining, has been attracting much attention since it is one of the essential data mining tasks with broad applications. The weighted sequential pattern mining aims to find more interesting sequential patterns, considering the different significance of each data element in a sequence database. In the conventional weighted sequential pattern mining, usually pre-assigned weights of data elements are used to get the importance, which are derived from their quantitative information and their importance in real world application domains. In general sequential pattern mining, the generation order of data elements is considered to find sequential patterns. However, their generation times and time-intervals are also important in real world application domains. Therefore, time-interval information of data elements can be helpful in finding more interesting sequential patterns. This paper presents a new framework for finding time-interval weighted sequential (TiWS) patterns in a sequence database and time-interval weighted support (TiW-support) to find the TiWS patterns. In addition, a new method of mining TiWS patterns in a sequence database is also presented. In the proposed framework of TiWS pattern mining, the weight of each sequence in a sequence database is first obtained from the time-intervals of elements in the sequence, and subsequently TiWS patterns are found considering the weight. A series of evaluation results shows that TIWS pattern mining is efficient and helpful in finding more interesting sequential patterns.

[1]  Philip Yu,et al.  WAR: Weighted association rules for item intensities , 2007, Knowledge and Information Systems.

[2]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[3]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[4]  Ke Sun,et al.  Mining Weighted Association Rules without Preassigned Weights , 2008, IEEE Transactions on Knowledge and Data Engineering.

[5]  Martin Ester,et al.  A TOP-DOWN APPROACH FOR MINING MOST SPECIFIC FREQUENT PATTERNS IN BIOLOGICAL SEQUENCE DATA , 2003 .

[6]  Jiawei Han,et al.  Frequent Closed Sequence Mining without Candidate Maintenance , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Yen-Liang Chen,et al.  Discovering fuzzy time-interval sequential patterns in sequence databases , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Chia-Wen Chang,et al.  Fast discovery of sequential patterns in large databases using effective time-indexing , 2008, Inf. Sci..

[9]  Wei Wang,et al.  Sequential Pattern Mining in Multi-Databases via Multiple Alignment , 2006, Data Mining and Knowledge Discovery.

[10]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[11]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[12]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[13]  Soon Myoung Chung,et al.  Efficient Mining of Maximal Sequential Patterns Using Multiple Samples , 2005, SDM.

[14]  Unil Yun,et al.  A new framework for detecting weighted sequential patterns in large sequence databases , 2008, Knowl. Based Syst..

[15]  Ming-Tat Ko,et al.  Discovering time-interval sequential patterns in sequence databases , 2003, Expert Syst. Appl..

[16]  Enhong Chen,et al.  Efficient strategies for tough aggregate constraint-based sequential pattern mining , 2008, Inf. Sci..

[17]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[18]  Ada Wai-Chee Fu,et al.  Mining association rules with weighted items , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[19]  Jian Pei,et al.  ApproxMAP: Approximate Mining of Consensus Sequential Patterns , 2003, SDM.

[20]  James Bailey,et al.  Mining minimal distinguishing subsequence patterns with gap constraints , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[21]  Shuchuan Lo Binary prediction based on weighted sequential mining method , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[22]  Fionn Murtagh,et al.  Weighted Association Rule Mining using weighted support and significance framework , 2003, KDD '03.

[23]  Jiawei Han,et al.  TSP: mining top-K closed sequential patterns , 2003, Third IEEE International Conference on Data Mining.

[24]  Jeffrey Xu Yu,et al.  Scalable sequential pattern mining for biological sequences , 2004, CIKM '04.

[25]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[26]  Xiang Zhang,et al.  A Top-Down Method for Mining Most-Specific Frequent Patterns in Biological Sequences , 2004, SDM.

[27]  Unil Yun,et al.  An efficient mining of weighted frequent patterns with length decreasing support constraints , 2008, Knowl. Based Syst..

[28]  Philip S. Yu,et al.  Mining long sequential patterns in a noisy environment , 2002, SIGMOD '02.