Sequential pattern mining -- approaches and algorithms

Sequences of events, items, or tokens occurring in an ordered metric space appear often in data and the requirement to detect and analyze frequent subsequences is a common problem. Sequential Pattern Mining arose as a subfield of data mining to focus on this field. This article surveys the approaches and algorithms proposed to date.

[1]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[2]  Ke Wang,et al.  Incremental Discovery of Sequential Patterns , 1996 .

[3]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[4]  Gwo-Hshiung Tzeng,et al.  A Fuzzy Data Mining Algorithm for Finding Sequential Patterns , 2003, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[5]  Hayato Yamana,et al.  Generalized Sequential Pattern Mining with Item Intervals , 2006, J. Comput..

[6]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[7]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[8]  Gad M. Landau,et al.  Incremental String Comparison , 1998, SIAM J. Comput..

[9]  Unil Yun,et al.  WSpan: Weighted Sequential pattern mining in large sequence databases , 2006, 2006 3rd International IEEE Conference Intelligent Systems.

[10]  Gemma C. Garriga,et al.  Summarizing Sequential Data with Closed Partial Orders , 2005, SDM.

[11]  Sourav S. Bhowmick,et al.  Sequential Pattern Mining: A Survey , 2003 .

[12]  Chi Lap Yip,et al.  Mining emerging substrings , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[13]  Mohammed J. Zaki Parallel Sequence Mining on Shared-Memory Machines , 1999, J. Parallel Distributed Comput..

[14]  B. John Oommen,et al.  The Normalized String Editing Problem Revisited , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[16]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[17]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[18]  János Csirik,et al.  Edit distance of run-length coded strings , 1992, SAC '92.

[19]  John F. Roddick,et al.  Association mining , 2006, CSUR.

[20]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[21]  David A. Padua,et al.  Parallel mining of closed sequential patterns , 2005, KDD '05.

[22]  Alfred V. Aho,et al.  Algorithms for Finding Patterns in Strings , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[23]  Gintautas Dzemyda,et al.  The Probabilistic Algorithm for Mining Frequent Sequences , 2004, ADBIS.

[24]  Hannu Toivonen,et al.  Discovery of frequent patterns in large data collections , 1996 .

[25]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[26]  Jiawei Han,et al.  GeoMiner: a system prototype for spatial data mining , 1997, SIGMOD '97.

[27]  Cláudia Antunes,et al.  SEQUENTIAL PATTERN MINING WITH APPROXIMATED CONSTRAINTS , 2004 .

[28]  Carolina Ruiz,et al.  FS-Miner: efficient and incremental mining of frequent sequence patterns in web logs , 2004, WIDM '04.

[29]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[30]  Arbee L. P. Chen,et al.  An efficient algorithm for mining frequent sequences by a new strategy without support counting , 2004, Proceedings. 20th International Conference on Data Engineering.

[31]  Mohammed J. Zaki Efficient enumeration of frequent sequences , 1998, CIKM '98.

[32]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[33]  Zhenglu Yang,et al.  LAPIN: Effective Sequential Pattern Mining Algorithms by Last Position Induction for Dense Databases , 2007, DASFAA.

[34]  Chi Lap Yip,et al.  A GSP-based Efficient Algorithm for Mining Frequent Sequences , 2001 .

[35]  Jiawei Han,et al.  Meta-Rule-Guided Mining of Association Rules in Relational Databases , 1995, KDOOD/TDOOD.

[36]  Frank Höppner Discovery of Temporal Patterns. Learning Rules about the Qualitative Behaviour of Time Series , 2001, PKDD.

[37]  Srinivasan Parthasarathy,et al.  Incremental and interactive sequence mining , 1999, CIKM '99.

[38]  Valerie Guralnik,et al.  Parallel tree-projection-based sequence mining algorithms , 2004, Parallel Comput..

[39]  George Karypis,et al.  A Universal Formulation of Sequential Patterns , 1999 .

[40]  Suh-Yin Lee,et al.  Interactive sequence discovery by incremental mining , 2004, Inf. Sci..

[41]  Wei Wang,et al.  Sequential Pattern Mining in Multi-Databases via Multiple Alignment , 2006, Data Mining and Knowledge Discovery.

[42]  F RoddickJohn,et al.  Sequential pattern mining -- approaches and algorithms , 2013 .

[43]  Qingguo Zheng,et al.  The Algorithms of Updating Sequetial Patterns , 2002, ArXiv.

[44]  Wei Wang,et al.  Benchmarking the effectiveness of sequential pattern mining methods , 2007, Data Knowl. Eng..

[45]  Frank Klawonn,et al.  Finding informative rules in interval sequences , 2001, Intell. Data Anal..

[46]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[47]  Yen-Liang Chen,et al.  Mining sequential patterns from multidimensional sequence data , 2005, IEEE Transactions on Knowledge and Data Engineering.

[48]  Jian Pei,et al.  ApproxMAP: Approximate Mining of Consensus Sequential Patterns , 2003, SDM.

[49]  Joseph B. Kruskal,et al.  Time Warps, String Edits, and Macromolecules , 1999 .

[50]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[51]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[52]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[53]  Maguelonne Teisseire,et al.  Incremental mining of sequential patterns in large databases , 2003, Data Knowl. Eng..

[54]  Jian Pei,et al.  Constrained frequent pattern mining: a pattern-growth view , 2002, SKDD.

[55]  Jian Pei,et al.  Mining frequent patterns by pattern-growth: methodology and implications , 2000, SKDD.

[56]  Yi-Chung Hu,et al.  Deriving two-stage learning sequences from knowledge in fuzzy sequential pattern mining , 2004, Inf. Sci..

[57]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[58]  Balaji Padmanabhan,et al.  Pattern Discovery in Temporal Databases: A Temporal Logic Approach , 1996, KDD.

[59]  Charu C. Aggarwal,et al.  Data Streams: Models and Algorithms (Advances in Database Systems) , 2006 .

[60]  David Wai-Lok Cheung,et al.  Efficient Algorithms for Incremental Update of Frequent Sequences , 2002, PAKDD.

[61]  Jian Pei,et al.  Constraint-based sequential pattern mining: the pattern-growth methods , 2007, Journal of Intelligent Information Systems.

[62]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[63]  George Karypis,et al.  Finding Frequent Patterns Using Length-Decreasing Support Constraints , 2005, Data Mining and Knowledge Discovery.

[64]  Philip S. Yu,et al.  Mining long sequential patterns in a noisy environment , 2002, SIGMOD '02.

[65]  Philip Hingston Using Finite State Automata for Sequence Mining , 2002, ACSC.

[66]  Yen-Liang Chen,et al.  Constraint-based sequential pattern mining: The consideration of recency and compactness , 2006, Decis. Support Syst..

[67]  Ronitt Rubinfeld,et al.  A sublinear algorithm for weakly approximating edit distance , 2003, STOC '03.

[68]  Richard Cole,et al.  Approximate string matching: a simpler faster algorithm , 2002, SODA '98.

[69]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[70]  Graham Cormode,et al.  The string edit distance matching problem with moves , 2002, SODA '02.

[71]  Ming-Syan Chen,et al.  A General Model for Sequential Pattern Mining with a Progressive Database , 2008, IEEE Transactions on Knowledge and Data Engineering.

[72]  Heikki Mannila,et al.  Discovering Generalized Episodes Using Minimal Occurrences , 1996, KDD.

[73]  Salvatore Orlando,et al.  A new algorithm for gap constrained sequence mining , 2004, SAC '04.

[74]  Jean-François Boulicaut,et al.  A Framework for Frequent Sequence Mining under Generalized Regular Expression Constraints , 2003, KDID.

[75]  Heikki Hyyrö,et al.  A Bit-Vector Algorithm for Computing Levenshtein and Damerau Edit Distances , 2003, Nord. J. Comput..

[76]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[77]  Qinghua Li,et al.  Parallel Algorithm for Mining Frequent Closed Sequences , 2005, AIS-ADM.

[78]  Karine Zeitouni,et al.  Indexed Bit Map (IBM) for Mining Frequent Sequences , 2005, PKDD.

[79]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[80]  Philip S. Yu,et al.  Discovering Frequent Closed Partial Orders from Strings , 2006, IEEE Transactions on Knowledge and Data Engineering.

[81]  George Karypis,et al.  SLPMiner: an algorithm for finding frequent sequential patterns using length-decreasing support constraint , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[82]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[83]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[84]  Céline Fiot,et al.  From Crispness to Fuzziness: Three Algorithms for Soft Sequential Pattern Mining , 2007, IEEE Transactions on Fuzzy Systems.

[85]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[86]  Dany Breslauer,et al.  Efficient String Matching on Coded Texts , 1994, CPM.

[87]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[88]  F. Masseglia,et al.  Sequential Pattern Mining : A Survey on Issues and Approaches , 2004 .

[89]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[90]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[91]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[92]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[93]  Chia-Hui Chang,et al.  ClosedPROWL: Efficient Mining of Closed Frequent Continuities by Projected Window List Technology , 2005, SDM.

[94]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[95]  Wei Wang,et al.  Intelligent Sequential Mining Via Alignment: Optimization Techniques for Very Large DB , 2007, PAKDD.

[96]  Thomas W. Miller Data and Text Mining: A Business Applications Approach , 2004 .

[97]  Hayato Yamana,et al.  Sequential Pattern Mining with Time Intervals , 2006, PAKDD.

[98]  Mohammed J. Zaki,et al.  PlanMine: Sequence Mining for Plan Failures , 1998, KDD.

[99]  Ömer Egecioglu,et al.  An efficient uniform-cost normalized edit distance algorithm , 1999, 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268).

[100]  Philip S. Yu,et al.  A Regression-Based Temporal Pattern Mining Scheme for Data Streams , 2003, VLDB.

[101]  Moshe Lewenstein,et al.  Faster algorithms for string matching with k mismatches , 2000, SODA '00.

[102]  Abdullah N. Arslan,et al.  Efficient Algorithms For Normalized Edit Distance , 2000 .

[103]  Chia-Hui Chang,et al.  PROWL: An Efficient Frequent continuity Mining Algorithm on Event Sequences , 2004, DaWaK.

[104]  Mika Klemettinen,et al.  Applying data mining techniques for descriptive phrase extraction in digital document collections , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[105]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[106]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[107]  Baw-Jhiune Liu,et al.  Identification of hot regions in protein-protein interactions by sequential pattern mining , 2007, BMC Bioinformatics.

[108]  George Karypis,et al.  LPMiner: an algorithm for finding frequent itemsets using length-decreasing support constraint , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[109]  Ada Wai-Chee Fu,et al.  Discovering Temporal Patterns for Interval-Based Events , 2000, DaWaK.

[110]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[111]  M. Klemettinen,et al.  Applying Data Mining Techniques in Text Analysis , 1997 .

[112]  Walter F. Tichy,et al.  The string-to-string correction problem with block moves , 1984, TOCS.

[113]  Jean-François Boulicaut,et al.  Mining Frequent Sequential Patterns under Regular Expressions: A Highly Adaptive Strategy for Pushing Contraints , 2003, SDM.

[114]  Charu C. Aggarwal,et al.  Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, DMKD 2003, San Diego, California, USA, June 13, 2003 , 2003, DMKD.

[115]  Jiawei Han,et al.  MAIDS: mining alarming incidents from data streams , 2004, SIGMOD '04.

[116]  Ahmed K. Elmagarmid,et al.  The Kluwer international series on advances in database systems , 1996 .

[117]  Valerie Guralnik,et al.  Parallel Tree Projection Algorithm for Sequence Mining , 2001, Euro-Par.

[118]  Simon Fraser MULTI-DIMENSIONAL SEQUENTIAL PATTERN MINING , 2001 .

[119]  Ming-Syan Chen,et al.  Experimental results on a constraint based sequential pattern mining for telecommunication alarm data , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.

[120]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[121]  Jaideep Srivastava,et al.  Pattern Directed Mining of Sequence Data , 1998, KDD.

[122]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[123]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[124]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[125]  Florent Masseglia,et al.  Mining Sequential Patterns from Temporal Streaming Data , 2005 .

[126]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[127]  Dino Pedreschi,et al.  Trajectory pattern mining , 2007, KDD '07.

[128]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[129]  Jeffrey Xu Yu,et al.  Scalable sequential pattern mining for biological sequences , 2004, CIKM '04.

[130]  Maria E. Orlowska,et al.  Improvements of IncSpan: Incremental Mining of Sequential Patterns in Large Database , 2005, PAKDD.

[131]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[132]  John F. Roddick,et al.  Guiding knowledge discovery through interactive data mining , 2003 .

[133]  Ke Xu,et al.  The Algorithms of Updating Sequential Patterns , 2002, cs/0203027.

[134]  John F. Roddick,et al.  Marking Time in Sequence Mining , 2006, AusDM.

[135]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[136]  B.J. Oommen,et al.  Pattern recognition of strings with substitutions, insertions, deletions and generalized transpositions , 1997, Pattern Recognit..

[137]  Florent Masseglia,et al.  The PSP Approach for Mining Sequential Patterns , 1998, PKDD.

[138]  Ming-Syan Chen,et al.  Mining Sequential Alarm Patterns in a Telecommunication Database , 2001, Databases in Telecommunications.

[139]  Romaric Besançon,et al.  Text Mining, knowledge extraction from unstructured textual data , 1998 .

[140]  Philip S. Yu,et al.  Efficiently mining frequent closed partial orders , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[141]  Kyuseok Shim,et al.  SQUIRE: Sequential pattern mining with quantities , 2007, J. Syst. Softw..

[142]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[143]  Soon Myoung Chung,et al.  A scalable algorithm for mining maximal frequent sequences using sampling , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[144]  Zhenglu Yang,et al.  LAPIN-SPAM: An Improved Algorithm for Mining Sequential Pattern , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[145]  Ke Wang,et al.  Discovering Patterns from Large and Dynamic Sequential Data , 1997, Journal of Intelligent Information Systems.

[146]  T. Hong,et al.  Mining fuzzy sequential patterns from multiple-item transactions , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[147]  Ayhan Demiriz,et al.  webSPADE: a parallel sequence mining algorithm to analyze web log data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[148]  R. Nock,et al.  Mining Sequential Patterns on Data Streams : A Near-Optimal Statistical Approach , 2005 .

[149]  Nizar R. Mabroukeh,et al.  A taxonomy of sequential pattern mining algorithms , 2010, CSUR.

[150]  Maria E. Orlowska,et al.  Finding Event-Oriented Patterns in Long Temporal Sequences , 2003, PAKDD.

[151]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[152]  Sunita Sarawagi,et al.  Mining Surprising Patterns Using Temporal Description Length , 1998, VLDB.