OrderSpan: Mining Closed Partially Ordered Patterns

Due to the complexity of the task, partially ordered pattern mining of sequential data has not been subject to much study, despite its usefulness. This paper investigates this data mining challenge by describing OrderSpan, a new algorithm that extracts such patterns from sequential databases and overcomes some of the drawbacks of existing methods. Our work consists in providing a simple and flexible framework to directly mine complex sequences of itemsets, by combining well-known properties on prefixes and suffixes. Experiments were performed on different real datasets to show the benefit of partially ordered patterns.

[1]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[2]  Boris Cule,et al.  Mining closed strict episodes , 2010, Data Mining and Knowledge Discovery.

[3]  Aloysius George,et al.  DRL-Prefixspan: A novel pattern growth algorithm for discovering downturn, revision and launch (DRL) sequential patterns , 2012, Central European Journal of Computer Science.

[4]  Miao Wang,et al.  Sequential Pattern Mining for Protein Function Prediction , 2008, ADMA.

[5]  Gemma Casas-Garriga,et al.  Summarizing Sequential Data with Closed Partial Orders. , 2005 .

[6]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[7]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[8]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[9]  Philip S. Yu,et al.  Discovering Frequent Closed Partial Orders from Strings , 2006, IEEE Transactions on Knowledge and Data Engineering.

[10]  Jiadong Ren,et al.  A Novel Sequential Pattern Mining Algorithm for the Feature Discovery of Software Fault , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[11]  Hongyan Liu,et al.  Mining Closed Episodes from Event Sequences Efficiently , 2010, PAKDD.

[12]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[13]  Maguelonne Teisseire,et al.  Sequential patterns mining and gene sequence visualization to discover novelty from microarray data , 2011, J. Biomed. Informatics.

[14]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[16]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[17]  Maguelonne Teisseire,et al.  Mining microarray data to predict the histological grade of a breast cancer , 2011, J. Biomed. Informatics.

[18]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[19]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.