A General Model for Sequential Pattern Mining with a Progressive Database

Although there have been many recent studies on the mining of sequential patterns in a static database and in a database with increasing data, these works, in general, do not fully explore the effect of deleting old data from the sequences in the database. When sequential patterns are generated, the newly arriving patterns may not be identified as frequent sequential patterns due to the existence of old data and sequences. Even worse, the obsolete sequential patterns that are not frequent recently may stay in the reported results. In practice, users are usually more interested in the recent data than the old ones. To capture the dynamic nature of data addition and deletion, we propose a general model of sequential pattern mining with a progressive database while the data in the database may be static, inserted, or deleted. In addition, we present a progressive algorithm Pisa, which stands for progressive mining of sequential patterns, to progressively discover sequential patterns in defined time period of interest (POI). The POI is a sliding window continuously advancing as the time goes by. Pisa utilizes a progressive sequential tree to efficiently maintain the latest data sequences, discover the complete set of up-to-date sequential patterns, and delete obsolete data and patterns accordingly. The height of the sequential pattern tree proposed is bounded by the length of POI, thereby effectively limiting the memory space required by Pisa that is significantly smaller than the memory needed by the alternative method, direct appending (DirApp). Note that the sequential pattern mining with a static database and with an incremental database are special cases of the progressive sequential pattern mining. By changing start time and end time of the POI, Pisa can easily deal with a static database or an incremental database as well. Complexity of algorithms proposed is analyzed. The experimental results show that Pisa not only significantly outperforms the prior methods in execution time by orders of magnitude but also possesses graceful scalability.

[1]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[2]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[3]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[4]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[5]  Mohammed J. Zaki Efficient enumeration of frequent sequences , 1998, CIKM '98.

[6]  Srinivasan Parthasarathy,et al.  Incremental and interactive sequence mining , 1999, CIKM '99.

[7]  Paramvir Bahl,et al.  Characterizing user behavior and network performance in a public wireless LAN , 2002, SIGMETRICS '02.

[8]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[9]  Suh-Yin Lee,et al.  Incremental update on sequential patterns in large databases , 1998, Proceedings Tenth IEEE International Conference on Tools with Artificial Intelligence (Cat. No.98CH36294).

[10]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[11]  Xindong Wu,et al.  Sequential pattern mining in multiple streams , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[13]  Sebastián Ventura,et al.  Personalized Links Recommendation Based on Data Mining in Adaptive Educational Hypermedia Systems , 2007, EC-TEL.

[14]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[15]  David Wai-Lok Cheung,et al.  Efficient Algorithms for Incremental Update of Frequent Sequences , 2002, PAKDD.

[16]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[17]  Florent Masseglia,et al.  Mining Sequential Patterns from Temporal Streaming Data , 2005 .

[18]  Hayato Yamana,et al.  Sequential Pattern Mining with Time Intervals , 2006, PAKDD.

[19]  Shashi Shekhar,et al.  Web Proxy Server with Intelligent Prefetcher for Dynamic Pages Using Association Rules , 2004 .

[20]  Suh-Yin Lee,et al.  Incremental update on sequential patterns in large databases by implicit merging and efficient counting , 2004, Inf. Syst..

[21]  Maguelonne Teisseire,et al.  Incremental mining of sequential patterns in large databases , 2003, Data Knowl. Eng..

[22]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[23]  Ming-Syan Chen,et al.  Utilizing Slice Scan and Selective Hash for Episode Mining , 2001 .

[24]  Soon Myoung Chung,et al.  Efficient Mining of Maximal Sequential Patterns Using Multiple Samples , 2005, SDM.

[25]  Emmanuel Viennet,et al.  bitSPADE: A Lattice-based Sequential Pattern Mining Algorithm Using Bitmap Representation , 2006, Sixth International Conference on Data Mining (ICDM'06).

[26]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[27]  David A. Padua,et al.  Parallel mining of closed sequential patterns , 2005, KDD '05.

[28]  Qiang Yang,et al.  Mining web logs for prediction models in WWW caching and prefetching , 2001, KDD '01.

[29]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[30]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[31]  Jia-Dong Ren,et al.  Mining Weighted Closed Sequential Patterns in Large Databases , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[32]  Jeffrey Xu Yu,et al.  Scalable sequential pattern mining for biological sequences , 2004, CIKM '04.

[33]  Maria E. Orlowska,et al.  Improvements of IncSpan: Incremental Mining of Sequential Patterns in Large Database , 2005, PAKDD.

[34]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[35]  Suh-Yin Lee,et al.  Incremental Mining of Sequential Patterns over a Stream Sliding Window , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[36]  Nikos Mamoulis,et al.  Mining frequent spatio-temporal sequential patterns , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[37]  Florent Masseglia,et al.  Mining sequential patterns from data streams: a centroid approach , 2006, Journal of Intelligent Information Systems.

[38]  Qiang Yang,et al.  Web-Log Mining for Predictive Web Caching , 2003, IEEE Trans. Knowl. Data Eng..

[39]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[40]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[41]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.