Sequential Pattern Mining : Survey and Current Research Challenges

185 Abstract— The concept of sequence Data Mining was first introduced by Rakesh Agrawal and Ramakrishnan Srikant in the year 1995. The problem was first introduced in the context of market analysis. It aimed to retrieve frequent patterns in the sequences of products purchased by customers through time ordered transactions. Later on its application was extended to complex applications like telecommunication, network detection, DNA research, etc. Several algorithms were proposed. The very first was Apriori algorithm, which was put forward by the founders themselves. Later more scalable algorithms for complex applications were developed. E.g. GSP, Spade, PrefixSpan etc. The area underwent considerable advancements since its introduction in a short span. In this paper, a systematic survey of the sequential pattern mining algorithms is performed. This paper investigates these algorithms by classifying study of sequential pattern-mining algorithms into two broad categories. First, on the basis of algorithms which are designed to increase efficiency of mining and second, on the basis of various extensions of sequential pattern mining designed for certain application. At the end, comparative analysis is done on the basis of important key features supported by various algorithms and current research challenges are discussed in this field of data mining.

[1]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[3]  Florent Masseglia,et al.  The PSP Approach for Mining Sequential Patterns , 1998, PKDD.

[4]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[5]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[6]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[7]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[8]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[9]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[10]  Ming-Tat Ko,et al.  Discovering time-interval sequential patterns in sequence databases , 2003, Expert Syst. Appl..

[11]  Ya-Han Hu,et al.  The Consideration of Recency and Compactness in Sequential Pattern Mining , 2004 .

[12]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[13]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Jian Pei,et al.  Constraint-based sequential pattern mining: the pattern-growth methods , 2007, Journal of Intelligent Information Systems.

[15]  Yen-Liang Chen,et al.  Discovering recency, frequency, and monetary (RFM) sequential patterns from customers' purchasing data , 2009, Electron. Commer. Res. Appl..

[16]  Hao-En Chueh,et al.  Mining Target-Oriented Sequential Patterns with Time-Intervals , 2010, ArXiv.

[17]  Fan Wu,et al.  Mining multi-level time-interval sequential patterns in sequence databases , 2010, The 2nd International Conference on Software Engineering and Data Mining.

[18]  Nizar R. Mabroukeh,et al.  A taxonomy of sequential pattern mining algorithms , 2010, CSUR.

[19]  Priyanka Tiwari,et al.  Multidimensional Sequential Pattern Mining , 2012 .