NetDAP: (δ, γ) −approximate pattern matching with length constraints

Pattern matching(PM) with gap constraints has been applied to compute the support of a pattern in a sequence, which is an essential task of the repetitive sequential pattern mining (or sequence pattern mining). Compared with exact PM, approximate PM allows data noise (differences) between the pattern and the matched subsequence. Therefore, more valuable patterns can be found. Approximate PM with gap constraints mainly adopts the Hamming distance to measure the approximation degree which only reflects the number of different characters between two sequences, but ignores the distance between different characters. Hence, this paper addresses ( δ , γ ) approximate PM with length constraints which employs local-global constraints to improve the accuracy of the PM, namely, the maximal distance between two corresponding characters is less or equal to the local threshold δ , and the sum of all the δ distances is also less or equal to the global threshold γ . To tackle the problem effectively, this paper proposes an effective online algorithm, named NetDAP, which employs a special designed data structure named approximate single-leaf Nettree. An approximate single-leaf Nettree can be created by adopting dynamic programming to determine the range of rootleaf, the minimal root, the maximal root, the range of nodes for each level, and the range of parents for each node. To improve the performance, two pruning strategies are proposed to prune the nodes and the parent-child relationships which do not satisfy the δ and γ distance constraints respectively. Finally, extensive experimental results on real protein data sets and time series verify the performance of the proposed algorithm.

[1]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[2]  Aoying Zhou,et al.  GFilter: A General Gram Filter for String Similarity Search , 2015, IEEE Transactions on Knowledge and Data Engineering.

[3]  Xindong Wu,et al.  Efficient sequential pattern mining with wildcards for keyphrase extraction , 2017, Knowl. Based Syst..

[4]  Xindong Wu,et al.  The Apriori property of sequence pattern mining with wildcard gaps , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[5]  Yin Li,et al.  FSPTwigFast: Holistic twig query on fuzzy spatiotemporal XML data , 2017, Applied Intelligence.

[6]  Philip S. Yu,et al.  HUOPM: High-Utility Occupancy Pattern Mining , 2018, IEEE Transactions on Cybernetics.

[7]  Jiadong Ren,et al.  Mining sequential patterns with periodic wildcard gaps , 2014, Applied Intelligence.

[8]  Unil Yun,et al.  Efficient approach for incremental high utility pattern mining with indexed list structure , 2019, Future Gener. Comput. Syst..

[9]  Wojciech Rytter,et al.  Approximate String Matching with Gaps , 2002, Nord. J. Comput..

[10]  Xindong Wu,et al.  Pattern Matching with Independent Wildcard Gaps , 2009, 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing.

[11]  Sabine Loudcher,et al.  Efficiently mining frequent itemsets applied for textual aggregation , 2018, Applied Intelligence.

[12]  Lei Guo,et al.  Subnettrees for Strict Pattern Matching with General Gaps and Length Constraints: Subnettrees for Strict Pattern Matching with General Gaps and Length Constraints , 2013 .

[13]  Dongsu Han,et al.  DFC: Accelerating String Pattern Matching for Network Applications , 2016, NSDI.

[14]  Haoran Xie,et al.  Sentiment Classification Using Negative and Intensive Sentiment Supplement Information , 2019, Data Science and Engineering.

[15]  Xun Wang,et al.  Review on mining data from multiple data sources , 2018, Pattern Recognit. Lett..

[16]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .

[17]  Philip Bille,et al.  String matching with variable length gaps , 2012, Theor. Comput. Sci..

[18]  Ricardo A. Baeza-Yates,et al.  An Algorithm for String Matching with a Sequence of don't Cares , 1991, Inf. Process. Lett..

[19]  Yan Li,et al.  Mining distinguishing subsequence patterns with nonoverlapping condition , 2018, Cluster Computing.

[20]  Longbing Cao,et al.  e-RNSP: An Efficient Method for Mining Repetition Negative Sequential Patterns , 2020, IEEE Transactions on Cybernetics.

[21]  Changjie Tang,et al.  Efficient Mining of Density-Aware Distinguishing Sequential Patterns with Gap Constraints , 2014, DASFAA.

[22]  Jianzhong Li,et al.  An efficient pruning strategy for approximate string matching over suffix tree , 2015, Knowledge and Information Systems.

[23]  Xindong Wu,et al.  Multi-pattern matching with variable-length wildcards using suffix tree , 2018, Pattern Analysis and Applications.

[24]  Ming Li,et al.  Efficient Mining of Gap-Constrained Subsequences and Its Various Applications , 2012, TKDD.

[25]  Yu Liu,et al.  Mining high utility itemsets by dynamically pruning the tree structure , 2013, Applied Intelligence.

[26]  Wei Song,et al.  Mining multi-relational high utility itemsets from star schemas , 2018, Intell. Data Anal..

[27]  Sung Wook Baik,et al.  SPPC: a new tree structure for mining erasable patterns in data streams , 2018, Applied Intelligence.

[28]  Markus L. Schmid,et al.  Pattern Matching with Variables: Efficient Algorithms and Complexity Results ∗ , 2020 .

[29]  Abdullah N. Arslan A fast algorithm for all-pairs Hamming distances , 2018, Inf. Process. Lett..

[30]  Xindong Wu,et al.  NETASPNO: Approximate Strict Pattern Matching Under Nonoverlapping Condition , 2018, IEEE Access.

[31]  Kameng Nip,et al.  A study on several combination problems of classic shop scheduling and shortest path , 2016, Theor. Comput. Sci..

[32]  Cong Shen,et al.  Strict pattern matching under non-overlapping condition , 2016, Science China Information Sciences.

[33]  Xindong Wu,et al.  Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph , 2012, Applied Intelligence.

[34]  Eljas Soisalon-Soininen,et al.  Online Dictionary Matching with Variable-Length Gaps , 2011, SEA.

[35]  Yang Ha,et al.  Mining Top-k Distinguishing Sequential Patterns with Gap Constraint , 2015 .

[36]  Min Wang,et al.  Discovering Patterns With Weak-Wildcard Gaps , 2016, IEEE Access.

[37]  Betsy George,et al.  New algorithms for pattern matching with wildcards and length constraints , 2015, Discret. Math. Algorithms Appl..

[38]  Philip Bille,et al.  Sparse Text Indexing in Small Space , 2016, TALG.

[39]  Xindong Wu,et al.  Strict approximate pattern matching with general gaps , 2014, Applied Intelligence.

[40]  Fan Min,et al.  Frequent pattern discovery with tri-partition alphabets , 2020, Inf. Sci..

[41]  Xindong Wu,et al.  Efficient pattern matching with periodical wildcards in uncertain sequences , 2018, Intelligent Data Analysis.

[42]  Xindong Wu,et al.  On big wisdom , 2018, Knowledge and Information Systems.

[43]  Longbing Cao,et al.  Mining Top- ${k}$ Useful Negative Sequential Patterns via Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Youcef Djenouri,et al.  A general-purpose distributed pattern mining system , 2020, Applied Intelligence.

[45]  Rudolf Fleischer,et al.  Order Preserving Matching , 2013, Theor. Comput. Sci..

[46]  Xindong Wu,et al.  NOSEP: Nonoverlapping Sequence Pattern Mining With Gap Constraints , 2018, IEEE Transactions on Cybernetics.

[47]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[48]  Moath Jarrah,et al.  MultiPLZW: A novel multiple pattern matching search in LZW-compressed data , 2019, Comput. Commun..

[49]  Dan Guo,et al.  Frequent Pattern Mining Based on Approximate Edit Distance Matrix , 2016, 2016 IEEE First International Conference on Data Science in Cyberspace (DSC).

[50]  Yue-Shi Lee,et al.  Mining non-redundant time-gap sequential patterns , 2013, Applied Intelligence.

[51]  David Haussler,et al.  On the Complexity of Iterated Shuffle , 1984, J. Comput. Syst. Sci..

[52]  He Jiang,et al.  Approximate pattern matching with gap constraints , 2016, J. Inf. Sci..

[53]  Eljas Soisalon-Soininen,et al.  Online Matching of Multiple Regular Patterns with Gaps and Character Classes , 2013, LATA.

[54]  Kotagiri Ramamohanarao,et al.  Septic shock prediction for ICU patients via coupled HMM walking on sequential contrast patterns , 2017, J. Biomed. Informatics.

[55]  Xindong Wu,et al.  A Nettree for pattern Matching with flexible wildcard Constraints , 2010, 2010 IEEE International Conference on Information Reuse & Integration.

[56]  Xindong Wu,et al.  NetNPG: Nonoverlapping pattern matching with general gap constraints , 2020, Applied Intelligence.

[57]  Gonzalo Navarro,et al.  Spaces, Trees, and Colors , 2013, ACM Comput. Surv..

[58]  Xin Chen,et al.  Fuzzy Clustering of Crowdsourced Test Reports for Apps , 2018, ACM Trans. Internet Techn..

[59]  Danny Barash,et al.  RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps , 2015, Nucleic Acids Res..

[60]  Viktor K. Prasanna,et al.  A Memory-Efficient and Modular Approach for Large-Scale String Pattern Matching , 2013, IEEE Transactions on Computers.

[61]  Farshad Tajeripour,et al.  A music symbols recognition method using pattern matching along with integrated projection and morphological operation techniques , 2017, Multimedia Tools and Applications.

[62]  Yan Li,et al.  NetNCSP: Nonoverlapping closed sequential pattern mining , 2020, Knowledge-Based Systems.

[63]  Gonzalo Navarro,et al.  Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching , 2003, J. Comput. Biol..