Incremental mining of sequential patterns in large databases

In this paper, we consider the problem of the incremental mining of sequential patterns when new transactions or new customers are added to an original database. We present a new algorithm for mining frequent sequences that uses information collected during an earlier mining process to cut down the cost of finding new sequential patterns in the updated database. Our test shows that the algorithm performs significantly faster than the naive approach of mining the whole updated database from scratch. The difference is so pronounced that this algorithm could also be useful for mining sequential patterns, since in many cases it is faster to apply our algorithm than to mine sequential patterns using a standard algorithm, by breaking down the database into an original database plus an increment.

[1]  Ke Wang,et al.  Discovering Patterns from Large and Dynamic Sequential Data , 1997, Journal of Intelligent Information Systems.

[2]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[3]  Wei Li,et al.  Scalable data mining for rules , 1998 .

[4]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[5]  Hannu T. T. Toivonen,et al.  Samplinglarge databases for finding association rules , 1996, VLDB 1996.

[6]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  David Wai-Lok Cheung,et al.  A General Incremental Technique for Maintaining Discovered Association Rules , 1997, DASFAA.

[10]  Philippe Pucheral,et al.  Bitmap based algorithms for mining association rules , 1998, BDA.

[11]  Sanjay Ranka,et al.  An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases , 1997, KDD.

[12]  John F. Roddick,et al.  Incremental Maintenance Techniques for Discovered Classification Rules , 1996, CODAS.

[13]  Suh-Yin Lee,et al.  Incremental update on sequential patterns in large databases , 1998, Proceedings Tenth IEEE International Conference on Tools with Artificial Intelligence (Cat. No.98CH36294).

[14]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[15]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[16]  Werner Nutt,et al.  Subsumption between queries to object-oriented databases , 1994, Inf. Syst..

[17]  David Wai-Lok Cheung,et al.  Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules , 1998, Data Mining and Knowledge Discovery.

[18]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[19]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[20]  Florent Masseglia,et al.  An efficient algorithm for Web usage mining , 1999 .

[21]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[22]  Yonatan Aumann,et al.  Efficient Algorithms for Discovering Frequent Sets in Incremental Databases , 1997, DMKD.

[23]  Srinivasan Parthasarathy,et al.  Incremental and interactive sequence mining , 1999, CIKM '99.

[24]  Nandlal L. Sarda,et al.  An adaptive algorithm for incremental mining of association rules , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[25]  Vikram Pudi,et al.  Quantifying the Utility of the Past in Mining Large Databases , 2000, Inf. Syst..