F-NSP+: A fast negative sequential patterns mining method with self-adaptive data storage

Abstract Mining negative sequential patterns (NSP) is an important tool for nonoccurring behavior analysis, and it is much more challenging than mining positive sequential patterns (PSPs) due to the high computational complexity and huge search space when obtaining the support of negative sequential candidates (NSCs). Very few NSP mining algorithms are available and most of them are very inefficient since they obtain the support of NSC by scanning the database repeatedly. Instead, the state-of-the-art NSP mining algorithm e-NSP only uses the PSP’s information stored in an array structure to ‘calculate’ the support of NSC by equations, without database re-scanning. This makes e-NSP highly efficient, particularly on sparse datasets. However, when datasets become dense, the key process to obtain the support of NSC in e-NSP becomes very time-consuming and needs to be improved. In this paper, we propose a novel and efficient data structure, a bitmap, to obtain the support of NSC. We correspondingly propose a fast NSP mining algorithm, f-NSP, which uses a bitmap to store the PSP’s information and then obtain the support of NSC only by bitwise operations, which is much faster than the hash method in e-NSP. Experimental results on real-world and synthetic datasets show that f-NSP is not only tens to hundreds of times faster than e-NSP, but also saves more than ten-fold the storage spaces of e-NSP, particularly on dense datasets with a large number of elements in a sequence or a small number of itemsets. Further, we find that f-NSP consumes more storage space than e-NSP when PSP’s support is less than a support threshold sdsup, a value obtained through our theoretical analysis of storage space. Accordingly, we propose a self-adaptive storage strategy and a corresponding algorithm f-NSP + to overcome this deficiency. f-NSP + can automatically choose a bitmap or an array structure to store PSP information according to PSP support. Experimental results show that f-NSP + saves more storage spaces of f-NSP, and has similar time efficiency as f-NSP.

[1]  Yun Fu,et al.  Prediction of Human Activity by Discovering Temporal Sequence Patterns , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Santiago Ontañón,et al.  Using a novel clumpiness measure to unite data with metadata: Finding common sequence patterns in immune receptor germline V genes , 2016, Pattern Recognit. Lett..

[3]  Vedant Rastogi,et al.  Mining Positive and Negative Sequential Pattern in Incremental Transaction Databases , 2013 .

[4]  Jie Zhou,et al.  Non-stationary data sequence classification using online class priors estimation , 2008, Pattern Recognit..

[5]  Wilfred Ng,et al.  Mining Probabilistically Frequent Sequential Patterns in Large Uncertain Databases , 2014, IEEE Transactions on Knowledge and Data Engineering.

[6]  Longbing Cao,et al.  USpan: an efficient algorithm for mining high utility sequential patterns , 2012, KDD.

[7]  Animesh Adhikari,et al.  Mining conditional patterns in a database , 2008, Pattern Recognit. Lett..

[8]  Philip S. Yu,et al.  Nonoccurring Behavior Analytics: A New Area , 2015, IEEE Intelligent Systems.

[9]  Sen Zhang,et al.  New Techniques for Mining Frequent Patterns in Unordered Trees , 2015, IEEE Transactions on Cybernetics.

[10]  Yongshun Gong,et al.  Research on Typical Algorithms in Negative Sequential Pattern Mining , 2015 .

[11]  Animesh Adhikari,et al.  Synthesizing heavy association rules from different real data sources , 2008, Pattern Recognit. Lett..

[12]  Jianying Hu,et al.  High-utility pattern mining: A method for discovery of high-utility item sets , 2007, Pattern Recognit..

[13]  Antonio Gomariz,et al.  TKS: Efficient Mining of Top-K Sequential Patterns , 2013, ADMA.

[14]  Min Gao,et al.  ELM-Based Large-Scale Genetic Association Study via Statistically Significant Pattern , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[15]  Jeffrey Xu Yu,et al.  Learning Phenotype Structure Using Sequence Model , 2014, IEEE Transactions on Knowledge and Data Engineering.

[16]  Wilhelmiina Hämäläinen,et al.  Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures , 2011, Knowledge and Information Systems.

[17]  Jian Pei,et al.  Debt Detection in Social Security by Sequence Classification Using Both Positive and Negative Patterns , 2009, ECML/PKDD.

[18]  Dmitriy Fradkin,et al.  Under Consideration for Publication in Knowledge and Information Systems Mining Sequential Patterns for Classification , 2022 .

[19]  Yongshun Gong,et al.  e-NSPFI: Efficient Mining Negative Sequential Pattern from Both Frequent and Infrequent Positive Sequential Patterns , 2017, Int. J. Pattern Recognit. Artif. Intell..

[20]  Jirí Grim Sequential pattern recognition by maximum conditional informativity , 2014, Pattern Recognit. Lett..

[21]  Antonio Gomariz,et al.  SPMF: a Java open-source pattern mining library , 2014, J. Mach. Learn. Res..

[22]  Longbing Cao,et al.  e-NSP: Efficient negative sequential pattern mining , 2016, Artif. Intell..

[23]  Chengpeng Bi,et al.  Comparison of optimization techniques for sequence pattern discovery by maximum-likelihood , 2010, Pattern Recognit. Lett..

[24]  Vedant Rastogi,et al.  Apriori Based: Mining Positive and Negative Frequent Sequential Patterns , 2012 .

[25]  Wei Cao,et al.  An effective contrast sequential pattern mining approach to taxpayer behavior analysis , 2015, World Wide Web.

[26]  Longbing Cao,et al.  Efficiently Mining Top-K High Utility Sequential Patterns , 2013, 2013 IEEE 13th International Conference on Data Mining.

[27]  Chengqi Zhang,et al.  Mining Impact-Targeted Activity Patterns in Imbalanced Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[28]  Philip S. Yu,et al.  Coupled Behavior Analysis with Applications , 2012, IEEE Transactions on Knowledge and Data Engineering.

[29]  Jhimli Adhikari,et al.  Measuring influence of an item in a database over time , 2010, Pattern Recognit. Lett..

[30]  Xiangjun Dong,et al.  Study of Positive and Negative Association Rules Based on Multi-confidence and Chi-Squared Test , 2006, ADMA.

[31]  R. Willink,et al.  A sequential algorithm for recognition of a developing pattern with application in orthotic engineering , 2008, Pattern Recognit..

[32]  Philip S. Yu,et al.  Detecting abnormal coupled sequences and sequence changes in group-based manipulative trading behaviors , 2010, KDD.

[33]  Bingru Yang,et al.  Index-BitTableFI: An improved algorithm for mining frequent itemsets , 2008, Knowl. Based Syst..