An Efficient Colossal Closed Itemset Mining Algorithm for a Dataset with High Dimensionality.

Abstract The greater interest of research in the field of bioinformatics and the ample amount of available data across the different domains paved the way for the generation of the dataset with high dimensionality. The number of features in the dataset with high dimensionality are very high and number of rows are less. The significance of the Frequent Colossal Closed Itemsets (FCCI) is high for diverse applications and also for the field of bioinformatics. FCCI are very prominent in the process of the decision making. Amount of information extraction from the dataset with high dimensionality is huge and this extraction is a non-trivial task. The pruning of all the inadmissible features and rows is not performed by the state-of-the-art algorithms. The proposed work articulates the pruning of all the inadmissible features and rows, an efficient pruning strategy to snip the row enumeration mining search space and closure method for checking the closeness of the rowset. An efficient row enumeration algorithm enclosing the rowset closure checking method and pruning strategy is designed to efficiently mine the complete set of FCCI. The experimental results demonstrate the effectiveness of pruning all the inadmissible features and rows.

[1]  John A. Keane,et al.  DisClose: Discovering Colossal Closed Itemsets via a Memory Efficient Compact Row-Tree , 2012, PAKDD Workshops.

[2]  Charu C. Aggarwal,et al.  Applications of Frequent Pattern Mining , 2014, Frequent Pattern Mining.

[3]  Kawuu W. Lin,et al.  Efficient algorithms for frequent pattern mining in many-task computing environments , 2013, Knowl. Based Syst..

[4]  Václav Snásel,et al.  Efficient algorithms for mining colossal patterns in high dimensional databases , 2017, Knowl. Based Syst..

[5]  Nagamma Patil,et al.  An efficient parallel row enumerated algorithm for mining frequent colossal closed itemsets from high dimensional datasets , 2019, Inf. Sci..

[6]  Gary Geunbae Lee,et al.  Subcellular Localization Prediction through Boosting Association Rules , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Kawuu W. Lin,et al.  A novel parallel algorithm for frequent pattern mining with privacy preserved in cloud computing environments , 2010, Int. J. Ad Hoc Ubiquitous Comput..

[8]  I-En Liao,et al.  A frequent itemset mining algorithm based on the Principle of Inclusion-Exclusion and transaction mapping , 2014, Inf. Sci..

[9]  Václav Snásel,et al.  Constraint-Based Method for Mining Colossal Patterns in High Dimensional Databases , 2017, ISAT.

[10]  Maria Bardosova,et al.  Using network science and text analytics to produce surveys in a scientific topic , 2015, J. Informetrics.

[11]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[12]  Ahmad Abdollahzadeh Barforoush,et al.  Parallel frequent itemset mining using systolic arrays , 2013, Knowl. Based Syst..

[13]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  Susan M. Bridges,et al.  Cross-Ontology Multi-level Association Rule Mining in the Gene Ontology , 2012, PloS one.

[15]  Makoto Haraguchi,et al.  Finding Top-N Colossal Patterns Based on Clique Search with Dynamic Update of Graph , 2012, ICFCA.

[16]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[17]  Pablo Moscato,et al.  Disclosed: An efficient depth-first, top-down algorithm for mining disjunctive closed itemsets in high-dimensional data , 2014, Inf. Sci..

[18]  Hongyan Liu,et al.  Top-down mining of frequent closed patterns from very high dimensional data , 2009, Inf. Sci..

[19]  Wojciech Szpankowski,et al.  Detecting Conserved Interaction Patterns in Biological Networks , 2006, J. Comput. Biol..

[20]  Jiayi Zhou,et al.  Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system , 2010, Expert Syst. Appl..

[21]  José María Carazo,et al.  BMC Bioinformatics BioMed Central Methodology article Integrated analysis of gene expression by association rules discovery , 2022 .

[22]  Jesús S. Aguilar-Ruiz,et al.  Gene association analysis: a survey of frequent pattern mining from gene expression data , 2010, Briefings Bioinform..

[23]  Young-Koo Lee,et al.  Efficient single-pass frequent pattern mining using a prefix-tree , 2009, Inf. Sci..

[24]  Ahmad Abdollahzadeh Barforoush,et al.  Efficient colossal pattern mining in high dimensional datasets , 2012, Knowl. Based Syst..

[25]  Salvatore Orlando,et al.  Fast and memory efficient mining of frequent closed itemsets , 2006, IEEE Transactions on Knowledge and Data Engineering.

[26]  Yuefeng Li,et al.  Effective Pattern Discovery for Text Mining , 2012, IEEE Transactions on Knowledge and Data Engineering.

[27]  Bart Goethals,et al.  A primer to frequent itemset mining for bioinformatics , 2013, Briefings Bioinform..