Mining sequential patterns with itemset constraints

Mining sequential patterns is used to discover all the frequent sequences in a sequence database. However, the mining may return a huge number of patterns, while the users are only interested in a particular subset of these. In this paper, we consider the problem of mining sequential patterns with itemset constraints. In order to solve this problem, we propose a new algorithm named MSPIC-DBV, which is a pattern-growth algorithm that uses prefixes and dynamic bit vectors. This algorithm prunes the search space at the beginning and during the mining process. Moreover, it reduces the number of candidates that need to be checked. The experimental results show that the proposed algorithm outperforms the previous methods.

[1]  Ming-Syan Chen,et al.  DFSP: a Depth-First SPelling algorithm for sequential pattern mining of biological sequences , 2012, Knowledge and Information Systems.

[2]  Sanjay Chawla,et al.  Sequential Pattern Mining with Constraints on Large Protein Databases , 2005, COMAD.

[3]  Yinglin Wang,et al.  Mining Contiguous Sequential Generators in Biological Sequences , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Yue-Shi Lee,et al.  Mining Sequential Patterns with Item Constraints , 2004, DaWaK.

[5]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[6]  Carla E. Brodley,et al.  KDD-Cup 2000 organizers' report: peeling the onion , 2000, SKDD.

[7]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  Yen-Liang Chen,et al.  Constraint-based sequential pattern mining: The consideration of recency and compactness , 2006, Decis. Support Syst..

[9]  Bay Vo,et al.  IMSR_PreTree: an improved algorithm for mining sequential rules based on the prefix-tree , 2014, Vietnam Journal of Computer Science.

[10]  Chadia Moghrabi,et al.  Authorship Attribution Using Small Sets of Frequent Part-of-Speech Skip-grams , 2016, FLAIRS.

[11]  Jian Pei,et al.  Constraint-based sequential pattern mining: the pattern-growth methods , 2007, Journal of Intelligent Information Systems.

[12]  Chieh-Yuan Tsai,et al.  A Location-Item-Time sequential pattern mining algorithm for route recommendation , 2015, Knowl. Based Syst..

[13]  Bay Vo,et al.  Mining frequent closed inter-sequence patterns efficiently using dynamic bit vectors , 2014, Applied Intelligence.

[14]  Pinar Senkul,et al.  Improving pattern quality in web usage mining by using semantic information , 2012, Knowledge and Information Systems.

[15]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[16]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[17]  Michelangelo Ceci,et al.  CloFAST: closed sequential pattern mining using sparse and vertical id-lists , 2016, Knowledge and Information Systems.

[18]  P. S. Grover,et al.  Constraint-based sequential pattern mining: a pattern growth algorithm incorporating compactness, length and monetary , 2014, Int. Arab J. Inf. Technol..

[19]  Mohammed J. Zaki,et al.  Prism: A Primal-Encoding Approach for Frequent Sequence Mining , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[20]  Tzung-Pei Hong,et al.  A Dynamic Bit-vector Approach for Efficiently Mining Inter-sequence Patterns , 2012, 2012 Third International Conference on Innovations in Bio-Inspired Computing and Applications.

[21]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[22]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[23]  Sandra de Amo,et al.  First-order temporal pattern mining with regular expression constraints , 2007, Data Knowl. Eng..

[24]  Suh-Yin Lee,et al.  Efficient mining of sequential patterns with time constraints by delimited pattern growth , 2005, Knowledge and Information Systems.

[25]  M. Teisseire,et al.  Efficient mining of sequential patterns with time constraints: Reducing the combinations , 2009, Expert Syst. Appl..

[26]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[27]  Tzung-Pei Hong,et al.  Mining non-redundant sequential rules with dynamic bit vectors and pruning techniques , 2016, Applied Intelligence.

[28]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[29]  Keun Ho Ryu,et al.  Discovering Important Sequential Patterns with Length-Decreasing Weighted Support Constraints , 2010, Int. J. Inf. Technol. Decis. Mak..

[30]  Junzhong Gu,et al.  Efficient Strategies for Average Constraint-Based Sequential Pattern Mining , 2010, 2010 International Conference on Multimedia Communications.

[31]  Salvatore Orlando,et al.  A new algorithm for gap constrained sequence mining , 2004, SAC '04.

[32]  Jinyan Li,et al.  Mining and Ranking Generators of Sequential Patterns , 2008, SDM.

[33]  Mohammed J. Zaki,et al.  Prism: An effective approach for frequent sequence mining via prime-block encoding , 2010, J. Comput. Syst. Sci..

[34]  Enhong Chen,et al.  Efficient strategies for tough aggregate constraint-based sequential pattern mining , 2008, Inf. Sci..

[35]  Yinglin Wang,et al.  CCSpan: Mining closed contiguous sequential patterns , 2015, Knowl. Based Syst..

[36]  Bay Vo,et al.  Combination of dynamic bit vectors and transaction information for mining frequent closed sequences efficiently , 2015, Eng. Appl. Artif. Intell..

[37]  Tzung-Pei Hong,et al.  DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets , 2012, Expert Syst. Appl..