Approaches for pattern discovery using sequential data mining

In this chapter we first introduce sequence data. We then discuss different approaches for mining of patterns from sequence data, studied in literature. Apriori based methods and the pattern growth methods are the earliest and the most influential methods for sequential pattern mining. There is also a vertical format based method which works on a dual representation of the sequence database. Work has also been done for mining patterns with constraints, mining closed patterns, mining patterns from multidimensional databases, mining closed repetitive gapped subsequences, and other forms of sequential pattern mining. Some works also focus on mining incremental patterns and mining from stream data. We present at least one method of each of these types and discuss their advantages and disadvantages. We conclude with a summary of the work.

[1]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[2]  Jesús S. Aguilar-Ruiz,et al.  Gene association analysis: a survey of frequent pattern mining from gene expression data , 2010, Briefings Bioinform..

[3]  R. J. Kuo,et al.  Integration of K-means algorithm and AprioriSome algorithm for fuzzy sequential pattern mining , 2009, Appl. Soft Comput..

[4]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[5]  Ming-Yen Lin,et al.  Incremental Discovery of Sequential Patterns Using a Backward Mining Approach , 2009, 2009 International Conference on Computational Science and Engineering.

[6]  Mario Piattini,et al.  An Experimental Replication With Data Warehouse Metrics , 2005, Int. J. Data Warehous. Min..

[7]  Jean-Marc Adamo,et al.  Data Mining for Association Rules and Sequential Patterns , 2000, Springer New York.

[8]  Jiawei Han,et al.  Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques, Second Edition , 2006, The Morgan Kaufmann series in data management systems.

[10]  Srinivasan Parthasarathy,et al.  Incremental and interactive sequence mining , 1999, CIKM '99.

[11]  Tatiana V. Sambukova Machine Learning in Studying the Organism’s Functional State of Clinically Healthy Individuals Depending on Their Immune Reactivity , 2013 .

[12]  Jiawei Han,et al.  TSP: mining top-K closed sequential patterns , 2003, Third IEEE International Conference on Data Mining.

[13]  Dmitriy Fradkin,et al.  Margin-closed frequent sequential pattern mining , 2010, UP '10.

[14]  David Wai-Lok Cheung,et al.  Efficient Algorithms for Incremental Update of Frequent Sequences , 2002, PAKDD.

[15]  Richard Nock,et al.  Statistical supports for mining sequential patterns and improving the incremental update process on data streams , 2007, Intell. Data Anal..

[16]  M. Teisseire,et al.  Efficient mining of sequential patterns with time constraints: Reducing the combinations , 2009, Expert Syst. Appl..

[17]  Maguelonne Teisseire,et al.  Incremental mining of sequential patterns in large databases , 2003, Data Knowl. Eng..

[18]  ChenYen-Liang,et al.  A novel knowledge discovering model for mining fuzzy multi-level sequential patterns in sequence databases , 2008 .

[19]  Jiawei Han,et al.  Stream Sequential Pattern Mining with Precise Error Bounds , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[20]  Mark Brown For Additional Reading , 2008 .

[21]  Florent Masseglia,et al.  The PSP Approach for Mining Sequential Patterns , 1998, PKDD.

[22]  Georges Gardarin,et al.  Advances in Database Technology — EDBT '96 , 1996, Lecture Notes in Computer Science.

[23]  Jan Chomicki,et al.  Hippo: A System for Computing Consistent Answers to a Class of SQL Queries , 2004, EDBT.

[24]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[25]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[26]  Maguelonne Teisseire,et al.  Sequential Pattern Mining , 2009, Encyclopedia of Data Warehousing and Mining.

[27]  G. Karypis,et al.  Parallel Algorithms for Mining Sequential Associations : Issues and Challenges , 2000 .

[28]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[29]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[30]  Li Yang,et al.  Visualizing Frequent Itemsets, Association Rules, and Sequential Patterns in Parallel Coordinates , 2003, ICCSA.

[31]  Sasa M. Dekleva,et al.  Advances in Database Technology , 1996 .

[32]  Wei Wang,et al.  Benchmarking the effectiveness of sequential pattern mining methods , 2007, Data Knowl. Eng..

[33]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[34]  Xenia Naidenova,et al.  Diagnostic Test Approaches to Machine Learning and Commonsense Reasoning Systems , 2012 .

[35]  K. Fernow New York , 1896, American Potato Journal.

[36]  Umeshwar Dayal,et al.  Multi-dimensional sequential pattern mining , 2001, CIKM '01.

[37]  Wei Wang,et al.  Sequential Pattern Mining in Multi-Databases via Multiple Alignment , 2006, Data Mining and Knowledge Discovery.

[38]  George Karypis,et al.  SLPMiner: an algorithm for finding frequent sequential patterns using length-decreasing support constraint , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[39]  F. Masseglia,et al.  Sequential Pattern Mining : A Survey on Issues and Approaches , 2004 .

[40]  Enhong Chen,et al.  Efficient strategies for tough aggregate constraint-based sequential pattern mining , 2008, Inf. Sci..

[41]  Wei-Hua Hao,et al.  Mining strong positive and negative sequential patterns , 2008 .

[42]  Kaizhong Zhang,et al.  Combinatorial pattern discovery for scientific data: some preliminary results , 1994, SIGMOD '94.

[43]  N. R. Srinivasa Raghavan,et al.  Data mining in e-commerce: A survey , 2005 .

[44]  Unil Yun,et al.  A new framework for detecting weighted sequential patterns in large sequence databases , 2008, Knowl. Based Syst..

[45]  Vishal Bhatnagar,et al.  Data Mining in Dynamic Social Networks and Fuzzy Systems , 2013 .

[46]  M. Sulaiman Khan,et al.  Finding Associations in Composite Data Sets: The CFARM Algorithm , 2011, Int. J. Data Warehous. Min..

[47]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[48]  Yen-Liang Chen,et al.  Constraint-based sequential pattern mining: The consideration of recency and compactness , 2006, Decis. Support Syst..

[49]  Kyuseok Shim,et al.  Mining Sequential Patterns with Regular Expression Constraints , 2002, IEEE Trans. Knowl. Data Eng..

[50]  Vishal Bhatnagar,et al.  Data Preprocessing for Dynamic Social Network Analysis , 2013 .

[51]  Jian Pei,et al.  Constraint-based sequential pattern mining: the pattern-growth methods , 2007, Journal of Intelligent Information Systems.

[52]  Dimitrios I. Fotiadis,et al.  A two-stage methodology for sequence classification based on sequential pattern mining and optimization , 2008, Data Knowl. Eng..

[53]  Da Ruan,et al.  Intelligent Data Mining: Techniques and Applications , 2005, Studies in Computational Intelligence.

[54]  Jiong Yang,et al.  Mining Sequential Patterns from Large Data Sets , 2005, Advances in Database Systems.

[55]  Masaru Kitsuregawa,et al.  Mining Algorithms for Sequential Patterns in Parallel: Hash Based Approach , 1998, PAKDD.

[56]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[57]  Yen-Liang Chen,et al.  A novel knowledge discovering model for mining fuzzy multi-level sequential patterns in sequence databases , 2008, Data Knowl. Eng..