Mining distinguishing subsequence patterns with nonoverlapping condition

Distinguishing subsequence patterns mining aims to discover the differences between different categories of sequence databases and to express characteristics of classes. It plays an important role in biomedicine, feature information selection, time-series classification, and other areas. The existing distinguishing subsequence patterns mining only focuses on whether a pattern appears in a sequence, regardless of the number of occurrences of the pattern in the sequence and the proportion of the pattern in the entire sequence database, which affects the discovery of the distinguishing patterns when there are a large number of irrelevant occurrences. Therefore, the nonoverlapping conditional distinguishing subsequence patterns mining algorithm is proposed. In this paper, we focus on the number of nonoverlapping occurrences that effectively reduce the number of irrelevant or redundant occurrences, and in this way, the number of occurrences can be better grasped. At the same time, we use a specially designed data structure, namely, a Nettree, to avoid backtracking. In addition, we use the distinguishing patterns as classification features, and carry out classification experiments on DNA sequences and time-series data with two classes. Extensive experimental results and comparisons demonstrate the efficiency of the proposed algorithm and the correctness of the feature extraction.

[1]  Hung T. Nguyen,et al.  Risk Prediction for Acute Hypotensive Patients by Using Gap Constrained Sequential Contrast Patterns , 2014, AMIA.

[2]  Min Wang,et al.  Discovering Patterns With Weak-Wildcard Gaps , 2016, IEEE Access.

[3]  Xindong Wu,et al.  A Nettree for pattern Matching with flexible wildcard Constraints , 2010, 2010 IEEE International Conference on Information Reuse & Integration.

[4]  Xindong Wu,et al.  Strict approximate pattern matching with general gaps , 2014, Applied Intelligence.

[5]  Jiawei Han,et al.  Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[6]  Dmitriy Fradkin,et al.  Under Consideration for Publication in Knowledge and Information Systems Mining Sequential Patterns for Classification , 2022 .

[7]  Xindong Wu,et al.  The Apriori property of sequence pattern mining with wildcard gaps , 2010 .

[8]  Marc Boullé,et al.  A user parameter-free approach for mining robust sequential classification rules , 2017, Knowledge and Information Systems.

[9]  Hui Xiong,et al.  Occupancy-Based Frequent Pattern Mining* , 2015, ACM Trans. Knowl. Discov. Data.

[10]  Dong Liu,et al.  Length-Changeable Incremental Extreme Learning Machine , 2017, Journal of Computer Science and Technology.

[11]  Danny Barash,et al.  RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps , 2015, Nucleic Acids Res..

[12]  Yang Ha,et al.  Mining Top-k Distinguishing Sequential Patterns with Gap Constraint , 2015 .

[13]  K. Jea,et al.  A syntactic approach to twig-query matching on XML streams , 2011, J. Syst. Softw..

[14]  James Bailey,et al.  Mining minimal distinguishing subsequence patterns with gap constraints , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[15]  Changjie Tang,et al.  Efficient Mining of Density-Aware Distinguishing Sequential Patterns with Gap Constraints , 2014, DASFAA.

[16]  Ming Li,et al.  Efficient Mining of Gap-Constrained Subsequences and Its Various Applications , 2012, TKDD.

[17]  Boris Cule,et al.  Pattern Based Sequence Classification , 2016, IEEE Transactions on Knowledge and Data Engineering.

[18]  Marc Boullé,et al.  A Parameter-Free Approach for Mining Robust Sequential Classification Rules , 2015, 2015 IEEE International Conference on Data Mining.

[19]  Sen Zhang,et al.  New Techniques for Mining Frequent Patterns in Unordered Trees , 2015, IEEE Transactions on Cybernetics.

[20]  Xindong Wu,et al.  NOSEP: Nonoverlapping Sequence Pattern Mining With Gap Constraints , 2018, IEEE Transactions on Cybernetics.

[21]  Athanasios V. Vasilakos,et al.  Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data , 2016, IEEE Transactions on Services Computing.

[22]  Jiadong Ren,et al.  Mining sequential patterns with periodic wildcard gaps , 2014, Applied Intelligence.

[23]  B. Sathiyabhama,et al.  Frequent pagesets from web log by enhanced weighted association rule mining , 2016, Cluster Computing.

[24]  Cong Shen,et al.  Strict pattern matching under non-overlapping condition , 2016, Science China Information Sciences.

[25]  Yue-Shi Lee,et al.  Mining non-redundant time-gap sequential patterns , 2013, Applied Intelligence.

[26]  James R. Cole,et al.  The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis , 2004, Nucleic Acids Res..

[27]  Kevin Y. Yip,et al.  Mining periodic patterns with gap requirement from sequences , 2007 .

[28]  He Jiang,et al.  Approximate pattern matching with gap constraints , 2016, J. Inf. Sci..

[29]  Xuelong Li,et al.  Mining Spatial-Temporal Patterns and Structural Sparsity for Human Motion Data Denoising , 2015, IEEE Transactions on Cybernetics.