Sequence Pattern Mining: An Incremental Approach

-The basic idea of sequential pattern mining was first introduced by Agrawal and Srikant [1]. The sequence mining task is to discover a set of attributes, shared across time among a large number of objects in a given database. For example, consider the sales database of a bookstore, where the objects represent customers and the attributes represent authors or books. Let’s say that the database records the books bought by each customer over a period of time. The discovered patterns are the sequences of books most frequently bought by the customers. An example could be that, “70% of the people who buy Jane Austen’s Pride and Prejudice also buy Emma within a month.” Stores can use these patterns for promotions, shelf placement, etc. Sequential mining algorithms can mine a static database. But, nowadays, almost all databases are dynamic in nature and they grow incrementally. One way to handle this is to mine the whole database every time an update occurs. But it is highly inefficient and also undesirable. We must find a way to use the already mined information. An incremental mining algorithm does the same. It utilizes the mined information to get new set of frequent sequential patterns instead of mining the whole database from scratch. Note that the ultimate aim of using an incremental mining algorithm instead of non-incremental one is to gain efficiency with respect to time. Otherwise a non-incremental mining algorithm can also serve the purpose of mining very easily. So for incremental mining algorithm the time taken by the algorithm to mine complete set of frequent patterns must be considered[5] and there are various algorithm for sequence pattern non incremental and as well incremental Mining A. Notation : The notation used in this approach is defined below.  D: the original customer sequences.  T: the set of newly merged customer sequences from the newly inserted customer sequences.  U: the entire updated customer sequences  q: the number of newly added customer sequences belonging to old customers in the original database.  Su: the upper support threshold for large sequences.  Sl: the lower support threshold for pre-large sequences, Sl < Su.  Lk D : the set of large k-sequences from D.  Lk T : the set of large k-sequences from T.  Lk U : the set of large k-sequences from U.  Pk D : the set of pre-large k-sequences from D.  Pk T : the set of pre-large k-sequences from T.  Pk U : the set of pre-large k-sequences from U.  Ck: the set of all candidate k-sequences from T.  I: a sequence.  S(I): the number of occurrences of I in D.  S(I): the number of occurrence increments of I in T.  S(I): the number of occurrences of I in U.

[1]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[2]  Shanmugasundaram Hariharan,et al.  A Survey On Distributed Data Mining Process Via Grid , 2011 .

[3]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[4]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[5]  A. Chaturvedi,et al.  Frequent Pattern Mining Using Record Filter Approach , 2010 .

[6]  Anjan K. Koundinya,et al.  Map / Reduce Deisgn and Implementation of Apriori Alogirthm for handling voluminous data-sets , 2012, ArXiv.

[7]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[8]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[9]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[10]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[11]  Nhien-An Le-Khac,et al.  Distributed Frequent Itemsets Mining in Heterogeneous Platforms , 2007 .

[12]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[13]  Maguelonne Teisseire,et al.  Incremental mining of sequential patterns in large databases , 2003, Data Knowl. Eng..