A Survey of Sequential Pattern Mining

Discovering unexpected and useful patterns in databases is a fundamental data mining task. In recent years, a trend in data mining has been to design algorithms for discovering patterns in sequential data. One of the most popular data mining tasks on sequences is sequential pattern mining. It consists of discovering interesting subsequences in a set of sequences, where the interestingness of a subsequence can be measured in terms of various criteria such as its occurrence frequency, length, and profit. Sequential pattern mining has many real-life applications since data is encoded as sequences in many fields such as bioinformatics, e-learning, market basket analysis, text analysis, and webpage click-stream analysis. This paper surveys recent studies on sequential pattern mining and its applications. The goal is to provide both an introduction to sequential pattern mining, and a survey of recent advances and research opportunities. The paper is divided into four main parts. First, the task of sequential pattern mining is defined and its applications are reviewed. Key concepts and terminology are introduced. Moreover, main approaches and strategies to solve sequential pattern mining problems are presented. Limitations of traditional sequential pattern mining approaches are also highlighted, and popular variations of the task of sequential pattern mining are presented. The paper also presents research opportunities and the relationship to other popular pattern mining problems. Lastly, the paper also discusses open-source implementations of sequential pattern mining algorithms.

[1]  Byeong-Soo Jeong,et al.  A Novel Approach for Mining High‐Utility Sequential Patterns in Sequence Databases , 2010 .

[2]  Nizar R. Mabroukeh,et al.  A taxonomy of sequential pattern mining algorithms , 2010, CSUR.

[3]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[4]  José Francisco Martínez Trinidad,et al.  A New Algorithm for Fast Discovery of Maximal Sequential Patterns in a Document Collection , 2006, CICLing.

[5]  Tzung-Pei Hong,et al.  Updating the Sequential Patterns in Dynamic Databases for Customer Sequences Deletion , 2015 .

[6]  Jian Wang,et al.  Towards Efficient Sequential Pattern Mining in Temporal Uncertain Databases , 2015, PAKDD.

[7]  Tzung-Pei Hong,et al.  Applying the maximum utility measure in high utility sequential pattern mining , 2014, Expert Syst. Appl..

[8]  Pinar Senkul,et al.  CRoM and HuspExt: Improving Efficiency of High Utility Sequential Pattern Extraction , 2015, IEEE Transactions on Knowledge and Data Engineering.

[9]  Maguelonne Teisseire,et al.  Incremental mining of sequential patterns in large databases , 2003, Data Knowl. Eng..

[10]  Wilfred Ng,et al.  Mining Probabilistically Frequent Sequential Patterns in Large Uncertain Databases , 2014, IEEE Transactions on Knowledge and Data Engineering.

[11]  Wei-Hua Hao,et al.  Fast mining maximal sequential patterns , 2007 .

[12]  Longbing Cao,et al.  USpan: an efficient algorithm for mining high utility sequential patterns , 2012, KDD.

[13]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[14]  Vincent S. Tseng,et al.  Efficient Mining of High-Utility Sequential Rules , 2015, MLDM.

[15]  Zhenglu Yang,et al.  LAPIN-SPAM: An Improved Algorithm for Mining Sequential Pattern , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[16]  Antonio Gomariz,et al.  VGEN: Fast Vertical Mining of Sequential Generator Patterns , 2014, DaWaK.

[17]  Bart Goethals,et al.  Survey on Frequent Pattern Mining , 2003 .

[18]  Maguelonne Teisseire,et al.  Mining closed partially ordered patterns, a new optimized algorithm , 2015, Knowl. Based Syst..

[19]  Tzung-Pei Hong,et al.  DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets , 2012, Expert Syst. Appl..

[20]  Francesco Bonchi,et al.  Pushing Tougher Constraints in Frequent Pattern Mining , 2005, PAKDD.

[21]  Longbing Cao,et al.  Mining Partially-Ordered Sequential Rules Common to Multiple Sequences , 2015, IEEE Trans. Knowl. Data Eng..

[22]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[23]  Antonio Gomariz,et al.  SPMF: a Java open-source pattern mining library , 2014, J. Mach. Learn. Res..

[24]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[25]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.

[26]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[27]  T. Hong,et al.  Mining fuzzy sequential patterns from multiple-item transactions , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[28]  Antonio Gomariz,et al.  TKS: Efficient Mining of Top-K Sequential Patterns , 2013, ADMA.

[29]  Suh-Yin Lee,et al.  Incremental Mining of Sequential Patterns over a Stream Sliding Window , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[30]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[31]  Vincent S. Tseng,et al.  ERMiner: Sequential Rule Mining Using Equivalence Classes , 2014, IDA.

[32]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[33]  Patrick Meyer,et al.  Association Rule Interestingness Measures: Experimental and Theoretical Studies , 2007, Quality Measures in Data Mining.

[34]  Maria E. Orlowska,et al.  Improvements of IncSpan: Incremental Mining of Sequential Patterns in Large Database , 2005, PAKDD.

[35]  Yeong-Chyi Lee,et al.  Actionable high-coherent-utility fuzzy itemset mining , 2014, Soft Comput..

[36]  Yuanyuan Zhang,et al.  An effective algorithm for mining sequential generators , 2011 .

[37]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[38]  Yue-Shi Lee,et al.  Incremental and interactive mining of web traversal patterns , 2008, Inf. Sci..

[39]  Young-Koo Lee,et al.  Discovering Periodic-Frequent Patterns in Transactional Databases , 2009, PAKDD.

[40]  Yanchang Zhao,et al.  Negative-GSP: An Efficient Method for Mining Negative Sequential Patterns , 2009, AusDM.

[41]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[42]  Panida Songram,et al.  Closed Multidimensional Sequential Pattern Mining , 2006, Third International Conference on Information Technology: New Generations (ITNG'06).

[43]  Suhardi,et al.  Anomaly-based intrusion detection and prevention system on website usage using rule-growth sequential pattern analysis: Case study: Statistics of Indonesia (BPS) website , 2014, 2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA).

[44]  François Rioult,et al.  Efficiently Depth-First Minimal Pattern Mining , 2014, PAKDD.

[45]  Soon Myoung Chung,et al.  Efficient Mining of Maximal Sequential Patterns Using Multiple Samples , 2005, SDM.

[46]  Rajeev Raman,et al.  Mining sequential patterns from probabilistic databases , 2011, Knowledge and Information Systems.

[47]  P. Krishna Reddy,et al.  Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports , 2009, COMAD.

[48]  Chadia Moghrabi,et al.  Authorship Attribution Using Small Sets of Frequent Part-of-Speech Skip-grams , 2016, FLAIRS.

[49]  Roque Marín,et al.  ClaSP: An Efficient Algorithm for Mining Frequent Closed Sequences , 2013, PAKDD.

[50]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[51]  Jian Pei,et al.  Constraint-based sequential pattern mining: the pattern-growth methods , 2007, Journal of Intelligent Information Systems.

[52]  Salvatore Orlando,et al.  Fast and memory efficient mining of frequent closed itemsets , 2006, IEEE Transactions on Knowledge and Data Engineering.

[53]  Laks V. S. Lakshmanan,et al.  Pushing Convertible Constraints in Frequent Itemset Mining , 2004, Data Mining and Knowledge Discovery.

[54]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[55]  Masaru Kitsuregawa,et al.  Efficient discovery of periodic-frequent patterns in very large databases , 2016, J. Syst. Softw..

[56]  Tzung-Pei Hong,et al.  MSGPs: A Novel Algorithm for Mining Sequential Generator Patterns , 2012, ICCCI.

[57]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[58]  Umeshwar Dayal,et al.  Multi-dimensional sequential pattern mining , 2001, CIKM '01.

[59]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[60]  Rajeev Raman,et al.  On Probabilistic Models for Uncertain Sequential Pattern Mining , 2010, ADMA.

[61]  Suh-Yin Lee,et al.  Improving the efficiency of interactive sequential pattern mining by incremental pattern discovery , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[62]  Jianyong Wang,et al.  Efficient mining of frequent sequence generators , 2008, WWW.

[63]  Céline Fiot,et al.  From Crispness to Fuzziness: Three Algorithms for Soft Sequential Pattern Mining , 2007, IEEE Transactions on Fuzzy Systems.

[64]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[65]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[66]  Fábio Porto,et al.  Spatial Sequential Pattern Mining for Seismic Data , 2016, SBBD.

[67]  Charu C. Aggarwal,et al.  Data Mining: The Textbook , 2015 .

[68]  Vincent S. Tseng,et al.  Mining Top-K Sequential Rules , 2011, ADMA.

[69]  Engelbert Mephu Nguifo,et al.  A Knowledge Discovery Framework for Learning Task Models from User Interactions in Intelligent Tutoring Systems , 2008, MICAI.

[70]  Emmanuel Viennet,et al.  bitSPADE: A Lattice-based Sequential Pattern Mining Algorithm Using Bitmap Representation , 2006, Sixth International Conference on Data Mining (ICDM'06).

[71]  Lei Chang,et al.  SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[72]  Vincent S. Tseng,et al.  Mining Top-K Association Rules , 2012, Canadian Conference on AI.

[73]  Mohammad Al Hasan,et al.  An Iterative MapReduce Based Frequent Subgraph Mining Algorithm , 2013, IEEE Transactions on Knowledge and Data Engineering.

[74]  Albrecht Zimmermann,et al.  Understanding episode mining techniques: Benchmarking on diverse, realistic, artificial data , 2014, Intell. Data Anal..

[75]  Amedeo Napoli,et al.  A fast compound algorithm for mining generators, closed itemsets, and computing links between equivalence classes , 2013, Annals of Mathematics and Artificial Intelligence.

[76]  Hans Friedrich Witschel,et al.  Using Consumer Behavior Data to Reduce Energy Consumption in Smart Homes: Applying Machine Learning to Save Energy without Lowering Comfort of Inhabitants , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[77]  Won Suk Lee,et al.  Efficient mining method for retrieving sequential patterns over online data streams , 2005, J. Inf. Sci..

[78]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[79]  Vincent S. Tseng,et al.  Using Partially-Ordered Sequential Rules to Generate More Accurate Sequence Prediction , 2012, ADMA.

[80]  Joong Hyuk Chang,et al.  Mining weighted sequential patterns in a sequence database with a time-interval weight , 2011, Knowl. Based Syst..

[81]  Vincent S. Tseng,et al.  Novel Concise Representations of High Utility Itemsets Using Generator Patterns , 2014, ADMA.

[82]  Hayato Yamana,et al.  Generalized Sequential Pattern Mining with Item Intervals , 2006, J. Comput..

[83]  Tzung-Pei Hong,et al.  Incrementally updating the discovered sequential patterns based on pre-large concept , 2015, Intell. Data Anal..

[84]  Jian Wang,et al.  Mining Uncertain Sequential Patterns in Iterative MapReduce , 2015, PAKDD.

[85]  Unil Yun,et al.  WSpan: Weighted Sequential pattern mining in large sequence databases , 2006, 2006 3rd International IEEE Conference Intelligent Systems.

[86]  Zhe Wang,et al.  Mining Maximal Sequential Patterns , 2005, 2005 International Conference on Neural Networks and Brain.

[87]  Michelangelo Ceci,et al.  CloFAST: closed sequential pattern mining using sparse and vertical id-lists , 2016, Knowledge and Information Systems.

[88]  Eamonn J. Keogh,et al.  iSAX 2.0: Indexing and Mining One Billion Time Series , 2010, 2010 IEEE International Conference on Data Mining.

[89]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[90]  Heinz Ulrich Hoppe,et al.  Resource Access Patterns in Exam Preparation Activities , 2015, EC-TEL.

[91]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[92]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[93]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[94]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[95]  Sanjay Chawla,et al.  Sequential Pattern Mining with Constraints on Large Protein Databases , 2005, COMAD.

[96]  Manuel Campos,et al.  Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information , 2014, PAKDD.

[97]  Frans Coenen,et al.  A survey of frequent subgraph mining algorithms , 2012, The Knowledge Engineering Review.

[98]  Won Suk Lee,et al.  CP-tree: An adaptive synopsis structure for compressing frequent itemsets over online data streams , 2014, Inf. Sci..

[99]  Donato Malerba,et al.  FAST Sequence Mining Based on Sparse Id-Lists , 2011, ISMIS.

[100]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[101]  Chia-Hui Chang,et al.  COBRA: Closed Sequential Pattern Mining Using Bi-phase Reduction Approach , 2006, DaWaK.

[102]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[103]  Philip S. Yu,et al.  Discovering Frequent Closed Partial Orders from Strings , 2006, IEEE Transactions on Knowledge and Data Engineering.

[104]  Jia-Dong Ren,et al.  Mining Weighted Closed Sequential Patterns in Large Databases , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[105]  Yanchang Zhao,et al.  An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns , 2010, PAKDD.

[106]  Antonio Gomariz,et al.  The SPMF Open-Source Data Mining Library Version 2 , 2016, ECML/PKDD.

[107]  Jinyan Li,et al.  Mining and Ranking Generators of Sequential Patterns , 2008, SDM.

[108]  Mohammed J. Zaki,et al.  Prism: An effective approach for frequent sequence mining via prime-block encoding , 2010, J. Comput. Syst. Sci..

[109]  Philippe Fournier-Viger,et al.  PHM: Mining Periodic High-Utility Itemsets , 2016, ICDM.

[110]  Yinglin Wang,et al.  CCSpan: Mining closed contiguous sequential patterns , 2015, Knowl. Based Syst..

[111]  M. Teisseire,et al.  SPEED : Mining Maxirnal Sequential Patterns over Data Strearns , 2006, 2006 3rd International IEEE Conference Intelligent Systems.

[112]  Tzung-Pei Hong,et al.  A fast Algorithm for mining fuzzy frequent itemsets , 2015, J. Intell. Fuzzy Syst..

[113]  Longbing Cao,et al.  e-NSP: Efficient negative sequential pattern mining , 2016, Artif. Intell..

[114]  Jiawei Han,et al.  Frequent Closed Sequence Mining without Candidate Maintenance , 2007, IEEE Transactions on Knowledge and Data Engineering.

[115]  Ming-Yen Lin,et al.  Mining Negative Sequential Patterns for E-commerce Recommendations , 2008, 2008 IEEE Asia-Pacific Services Computing Conference.

[116]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[117]  Antonio Gomariz,et al.  VMSP: Efficient Vertical Mining of Maximal Sequential Patterns , 2014, Canadian Conference on AI.

[118]  Vincent S. Tseng,et al.  Mining Maximal Sequential Patterns without Candidate Maintenance , 2013, ADMA.

[119]  Hoai Bac Le,et al.  MHHUSP: An integrated algorithm for mining and Hiding High Utility Sequential Patterns , 2016, 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE).

[120]  Srinivasan Parthasarathy,et al.  Incremental and interactive sequence mining , 1999, CIKM '99.

[121]  Alessandro Colantonio,et al.  EXPEDITE: EXPress closED ITemset Enumeration , 2015, Expert Syst. Appl..