Mining colossal patterns with length constraints

Mining of colossal patterns is used to mine patterns in databases with many attributes and values, but the number of instances in each database is small. Although many efficient approaches for extracting colossal patterns have been proposed, they cannot be applied to colossal pattern mining with constraints. In this paper, we solve the challenge of extracting colossal patterns with length constraints. Firstly, we describe the problems of min-length constraint and max-length constraint and combine them with length constraints. After that, we evolve a proposal for efficiently truncating candidates in the mining process and another one for fast checking of candidates. Based on these properties, we offer the mining algorithm of Length Constraints for Colossal Pattern (LCCP) to extract colossal patterns with length constraints. Experiments are also conducted to show the effectiveness of the proposed LCCP algorithm with a comparison to some other ones.

[1]  Caiquan Xiong,et al.  Frequent Patterns Mining in DNA Sequence , 2019, IEEE Access.

[2]  M. Seetha,et al.  Efficient and Accurate Discovery of Colossal Pattern Sequences from Biological Datasets: A Doubleton Pattern Mining Strategy (DPMine)☆ , 2015 .

[3]  Tzung-Pei Hong,et al.  A novel method for constrained class association rule mining , 2015, Inf. Sci..

[4]  Atsuo Yoshitaka,et al.  Mining web access patterns with super-pattern constraint , 2018, Applied Intelligence.

[5]  Benjamin M. Gyori,et al.  FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining , 2018, bioRxiv.

[6]  Ahmad Abdollahzadeh Barforoush,et al.  Efficient colossal pattern mining in high dimensional datasets , 2012, Knowl. Based Syst..

[7]  Christian Bessiere,et al.  Users Constraints in Itemset Mining , 2018, CP.

[8]  Václav Snásel,et al.  An efficient approach for mining sequential patterns using multiple threads on very large databases , 2018, Eng. Appl. Artif. Intell..

[9]  Bin Liu,et al.  Software defect prediction based on correlation weighted class association rule mining , 2020, Knowl. Based Syst..

[10]  Burkhard Rost,et al.  Detailed prediction of protein sub-nuclear localization , 2019, BMC Bioinformatics.

[11]  Jerry Chun-Wei Lin,et al.  TKE: Mining Top-K Frequent Episodes , 2020, IEA/AIE.

[12]  Hailei Zou,et al.  Clustering Algorithm and Its Application in Data Mining , 2020, Wirel. Pers. Commun..

[13]  Amir H. Gandomi,et al.  A survey of evolutionary computation for association rule mining , 2020, Inf. Sci..

[14]  Lu Yang,et al.  Mining of skyline patterns by considering both frequent and utility constraints , 2019, Eng. Appl. Artif. Intell..

[15]  Kuldeep Singh,et al.  Efficient Algorithm for Mining High Utility Pattern Considering Length Constraints , 2019, Int. J. Data Warehous. Min..

[16]  Su-Ling Lee,et al.  Using Market Basket Analysis to Find Semantic Duplicates in Ontology , 2020, ICCSA.

[17]  Jochen De Weerdt,et al.  Mining Behavioral Sequence Constraints for Classification , 2020, IEEE Transactions on Knowledge and Data Engineering.

[18]  Witold Pedrycz,et al.  Efficient mining of class association rules with the itemset constraint , 2016, Knowl. Based Syst..

[19]  Witold Pedrycz,et al.  Mining constrained inter-sequence patterns: a novel approach to cope with item constraints , 2018, Applied Intelligence.

[20]  Kalina Yacef,et al.  Sequential Pattern Mining Suggests Wellbeing Supportive Behaviors , 2019, IEEE Access.

[21]  Matthias Hagen,et al.  Weblog Analysis , 2014, Encyclopedia of Social Network Analysis and Mining.

[22]  André Augusto Ciré,et al.  Constraint-based Sequential Pattern Mining with Decision Diagrams , 2018, AAAI.

[23]  Nagamma Patil,et al.  Distributed Mining of Significant Frequent Colossal Closed Itemsets from Long Biological Dataset , 2018, ISDA.

[24]  Dharmendra Sharma,et al.  Deep Learning in Gene Expression Modeling , 2019, Handbook of Deep Learning Applications.

[25]  M. Seetha,et al.  A Doubleton Pattern Mining Approach for Discovering Colossal Patterns from Biological Dataset , 2015 .

[26]  Chau Yuen,et al.  Codes With Run-Length and GC-Content Constraints for DNA-Based Data Storage , 2018, IEEE Communications Letters.

[27]  Bay Vo,et al.  Mining sequential patterns with itemset constraints , 2018, Knowledge and Information Systems.

[28]  Nagamma Patil,et al.  An efficient parallel row enumerated algorithm for mining frequent colossal closed itemsets from high dimensional datasets , 2019, Inf. Sci..

[29]  Licong Cui,et al.  Query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource , 2018, BMC Medical Informatics and Decision Making.

[30]  Václav Snásel,et al.  Constraint-Based Method for Mining Colossal Patterns in High Dimensional Databases , 2017, ISAT.

[31]  Václav Snásel,et al.  Efficient algorithms for mining colossal patterns in high dimensional databases , 2017, Knowl. Based Syst..

[32]  Fathi E. Abd El-Samie,et al.  Non-parametric spectral estimation techniques for DNA sequence analysis and exon region prediction , 2019, Comput. Electr. Eng..

[33]  Lei Guo,et al.  NetDAP: (δ, γ) −approximate pattern matching with length constraints , 2020, Applied Intelligence.

[34]  Christian Bessiere,et al.  Constraint Programming for Association Rules , 2019, SDM.

[35]  Fatimah Audah Md. Zaki,et al.  RARE: Mining colossal closed itemset in high dimensional data , 2018, Knowl. Based Syst..

[36]  Nurul F. Zulkurnain,et al.  DisClose : discovering colossal closed itemsets from high dimensional datasets via a compact row-tree , 2012 .

[37]  Witold Pedrycz,et al.  Mining erasable itemsets with subset and superset itemset constraints , 2017, Expert Syst. Appl..

[38]  Madhavi Dabbiru,et al.  An Efficient Approach to Colossal Pattern Mining , 2010 .

[39]  Philip S. Yu,et al.  Mining Colossal Frequent Patterns by Core Pattern Fusion , 2007, 2007 IEEE 23rd International Conference on Data Engineering.