e-RNSP: An Efficient Method for Mining Repetition Negative Sequential Patterns

Negative sequential patterns (NSPs), which capture both frequent occurring and nonoccurring behaviors, become increasingly important and sometimes play a role irreplaceable by analyzing occurring behaviors only. Repetition sequential patterns capture repetitions of patterns in different sequences as well as within a sequence and are very important to understand the repetition relations between behaviors. Though some methods are available for mining NSP and repetition positive sequential patterns (RPSPs), we have not found any methods for mining repetition NSP (RNSP). RNSP can help the analysts to further understand the repetition relationships between items and capture more comprehensive information with repetition properties. However, mining RNSP is much more difficult than mining NSP due to the intrinsic challenges of nonoccurring items. To address the above issues, we first propose a formal definition of repetition negative containment. Then, we propose a method to convert repetition negative containment to repetition positive containment, which fast calculates the repetition supports by only using the corresponding RPSP’s information without rescanning databases. Finally, we propose an efficient algorithm, called e-RNSP, to mine RNSP efficiently. To the best of our knowledge, e-RNSP is the first algorithm to efficiently mine RNSP. Intensive experimental results on the first four real and synthetic datasets clearly show that e-RNSP can efficiently discover the repetition negative patterns; results on the fifth dataset prove the effectiveness of RNSP which are captured by the proposed method; and the results on the rest 16 datasets analyze the impacts of data characteristics on mining process.

[1]  Min Gao,et al.  ELM-Based Large-Scale Genetic Association Study via Statistically Significant Pattern , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[2]  Leonardo Vanneschi,et al.  A Characteristic-Based Framework for Multiple Sequence Aligners , 2018, IEEE Transactions on Cybernetics.

[3]  Longbing Cao,et al.  e-NSP: Efficient negative sequential pattern mining , 2016, Artif. Intell..

[4]  James M. Keller,et al.  Dynamic image sequence analysis using fuzzy measures , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[5]  Ma Chao,et al.  Clustering navigation patterns using closed repetitive gapped subsequence , 2010, 2010 International Conference on Logistics Systems and Intelligent Management (ICLSIM).

[6]  Sen Zhang,et al.  New Techniques for Mining Frequent Patterns in Unordered Trees , 2015, IEEE Transactions on Cybernetics.

[7]  Sebastián Ventura,et al.  On the Use of Genetic Programming for Mining Comprehensible Rules in Subgroup Discovery , 2014, IEEE Transactions on Cybernetics.

[8]  Brooke L Heidenfelder,et al.  Effects of sequence on repeat expansion during DNA replication. , 2003, Nucleic acids research.

[9]  Wei-Hua Hao,et al.  Mining strong positive and negative sequential patterns , 2008 .

[10]  Jiawei Han,et al.  Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[11]  Jinyan Li,et al.  Mining Iterative Generators and Representative Rules for Software Specification Discovery , 2011, IEEE Transactions on Knowledge and Data Engineering.

[12]  Ming-Yen Lin,et al.  Mining Negative Sequential Patterns for E-commerce Recommendations , 2008, 2008 IEEE Asia-Pacific Services Computing Conference.

[13]  Yen-Liang Chen,et al.  Discovering fuzzy time-interval sequential patterns in sequence databases , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Wei-Hua Hao,et al.  Mining negative sequential patterns , 2007 .

[15]  David Wai-Lok Cheung,et al.  Mining periodic patterns with gap requirement from sequences , 2005, SIGMOD '05.

[16]  Fan Min,et al.  Frequent pattern discovery with tri-partition alphabets , 2020, Inf. Sci..

[17]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[18]  Yongshun Gong,et al.  Mining Non-overlapping Repetitive Sequential Patterns by ImprovingGSP Algorithm , 2015 .

[19]  Yongshun Gong,et al.  Research on Typical Algorithms in Negative Sequential Pattern Mining , 2015 .

[20]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[21]  Wei-Min Ouyang,et al.  Mining Negative Sequential Patterns in Transaction Databases , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[22]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[23]  Longbing Cao,et al.  F-NSP+: A fast negative sequential patterns mining method with self-adaptive data storage , 2018, Pattern Recognit..

[24]  Chengqi Zhang,et al.  Mining Impact-Targeted Activity Patterns in Imbalanced Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[25]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[26]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[27]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[28]  Witold Pedrycz,et al.  Mining constrained inter-sequence patterns: a novel approach to cope with item constraints , 2018, Applied Intelligence.

[29]  EunJu Lee,et al.  Efficient weighted mining of repetitive subsequences , 2009, 2009 1st IEEE Symposium on Web Society.

[30]  Philip S. Yu,et al.  Mining asynchronous periodic patterns in time series data , 2000, KDD '00.

[31]  Xindong Wu,et al.  Coupled behavior analysis for capturing coupling relationships in group-based market manipulations , 2012, KDD.

[32]  Jianliang Xu,et al.  E-msNSP: Efficient Negative Sequential Patterns Mining Based on Multiple Minimum Supports , 2017, Int. J. Pattern Recognit. Artif. Intell..

[33]  Reda Alhajj,et al.  A Framework for Periodic Outlier Pattern Detection in Time-Series Sequences , 2014, IEEE Transactions on Cybernetics.

[34]  Philip S. Yu,et al.  Coupled Behavior Analysis with Applications , 2012, IEEE Transactions on Knowledge and Data Engineering.

[35]  Yongshun Gong,et al.  e-NSPFI: Efficient Mining Negative Sequential Pattern from Both Frequent and Infrequent Positive Sequential Patterns , 2017, Int. J. Pattern Recognit. Artif. Intell..

[36]  Stefan Decker,et al.  Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach , 2018, Inf. Sci..

[37]  Carlos A. Coello Coello,et al.  Sequence-Based Deterministic Initialization for Evolutionary Algorithms , 2017, IEEE Transactions on Cybernetics.

[38]  Zhao Li,et al.  Mining Compressed Repetitive Gapped Sequential Patterns Efficiently , 2009, ADMA.

[39]  Yanchang Zhao,et al.  Negative-GSP: An Efficient Method for Mining Negative Sequential Patterns , 2009, AusDM.

[40]  Chao Liu,et al.  Efficient mining of iterative patterns for software specification discovery , 2007, KDD '07.

[41]  Chun-Chi Chen,et al.  The Cyclic Model Analysis on Sequential Patterns , 2009, IEEE Transactions on Knowledge and Data Engineering.

[42]  F. Dick,et al.  An RB-EZH2 Complex Mediates Silencing of Repetitive DNA Sequences. , 2016, Molecular cell.

[43]  Ismail Hakki Toroslu Repetition support and mining cyclic patterns , 2003, Expert Syst. Appl..

[44]  Xiangjun Dong,et al.  Efficient High Utility Negative Sequential Patterns Mining in Smart Campus , 2018, IEEE Access.

[45]  Wei-Hua Hao,et al.  Mining negative fuzzy sequential patterns , 2007 .

[46]  Wei Cao,et al.  An effective contrast sequential pattern mining approach to taxpayer behavior analysis , 2015, World Wide Web.

[47]  Dmitriy Fradkin,et al.  Under Consideration for Publication in Knowledge and Information Systems Mining Sequential Patterns for Classification , 2022 .

[48]  Mohammad S. Obaidat,et al.  Dynamic Sample Size Detection in Learning Command Line Sequence for Continuous Authentication , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[49]  Chih-Fong Tsai,et al.  A novel approach for mining cyclically repeated patterns with multiple minimum supports , 2015, Appl. Soft Comput..

[50]  Guandong Xu,et al.  Behavior Informatics: A New Perspective , 2014, IEEE Intelligent Systems.

[51]  Longbing Cao,et al.  In-depth behavior understanding and use: The behavior informatics approach , 2010, Inf. Sci..

[52]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[53]  Chengqi Zhang,et al.  e-NSP: efficient negative sequential pattern mining based on identified positive patterns without database rescanning , 2011, CIKM '11.