Mining closed partially ordered patterns, a new optimized algorithm

Nowadays, sequence databases are available in several domains with increasing sizes. Exploring such databases with new pattern mining approaches involving new data structures is thus important. This paper investigates this data mining challenge by presenting OrderSpan, an algorithm that is able to extract a set of closed partially ordered patterns from a sequence database. It combines well-known properties of prefixes and suffixes. Furthermore, we extend OrderSpan by adapting efficient optimizations used in sequential pattern mining domain. Indeed, the proposed method is flexible and follows the sequential pattern paradigm. It is more efficient in the search space exploration, as it skips redundant branches. Experiments were performed on different real datasets to show (1) the effectiveness of the optimized approach and (2) the benefit of closed partially ordered patterns with respect to closed sequential patterns.

[1]  Boris Cule,et al.  Mining Closed Strict Episodes , 2010, ICDM.

[2]  Sašo Džeroski,et al.  Learning habitat models for the diatom community in Lake Prespa , 2010 .

[3]  Andy P. Dedecker,et al.  Decision Tree Models for Prediction of Macroinvertebrate Taxa in the River Axios (Northern Greece) , 2007, Aquatic Ecology.

[4]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[5]  Maguelonne Teisseire,et al.  Sequential patterns mining and gene sequence visualization to discover novelty from microarray data , 2011, J. Biomed. Informatics.

[6]  Hongyan Liu,et al.  Mining Closed Episodes from Event Sequences Efficiently , 2010, PAKDD.

[7]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[8]  Philip S. Yu,et al.  Discovering Frequent Closed Partial Orders from Strings , 2006, IEEE Transactions on Knowledge and Data Engineering.

[9]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10]  Jiadong Ren,et al.  A Novel Sequential Pattern Mining Algorithm for the Feature Discovery of Software Fault , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[11]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[12]  Saso Dzeroski,et al.  Predicting Structured Outputs k-Nearest Neighbours Method , 2011, Discovery Science.

[13]  Nizar R. Mabroukeh,et al.  A taxonomy of sequential pattern mining algorithms , 2010, CSUR.

[14]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[15]  Maguelonne Teisseire,et al.  Discriminant temporal patterns for linking physico-chemistry and biology in hydro-ecosystem assessment , 2014, Ecol. Informatics.

[16]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[17]  John F. Roddick,et al.  Sequential pattern mining -- approaches and algorithms , 2013, CSUR.

[18]  Gemma C. Garriga,et al.  Summarizing Sequential Data with Closed Partial Orders , 2005, SDM.

[19]  Miao Wang,et al.  Sequential Pattern Mining for Protein Function Prediction , 2008, ADMA.

[20]  Aloysius George,et al.  DRL-Prefixspan: A novel pattern growth algorithm for discovering downturn, revision and launch (DRL) sequential patterns , 2012, Central European Journal of Computer Science.

[21]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  Maguelonne Teisseire,et al.  OrderSpan: Mining Closed Partially Ordered Patterns , 2013, IDA.

[23]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[24]  Dimitrios Gunopulos,et al.  Discovering frequent arrangements of temporal intervals , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[25]  Ximing Cai,et al.  Identification of hydrologic indicators related to fish diversity and abundance: A data mining approach for fish community analysis , 2008 .

[26]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[27]  Florence Le Ber,et al.  Identifying Ecological Traits: A Concrete FCA-Based Approach , 2009, ICFCA.