HDminer: Efficient Mining of High Dimensional Frequent Closed Patterns from Dense Data

Frequent closed pattern mining has been developed for decades, mostly on a two dimensional matrix. This paper addresses the problem of mining high dimensional frequent closed patterns (nFCPs) from dense binary dataset, where the dataset is represented by a high dimensional cube. As existing FP-tree or enumeration tree based algorithms do not suit for n-dimensional dense data, we are motivated to propose a novel algorithm called HDminer for nFCPs mining. HDminer employs effective search space partition and pruning strategies to enhance the mining efficiency. We have implemented HDminer, and the performance studies on synthetic data and real microarray data show its superiority over existing algorithms.

[1]  Anthony K. H. Tung,et al.  Mining frequent closed patterns in microarray data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[2]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[3]  Anthony K. H. Tung,et al.  Mining frequent closed cubes in 3D datasets , 2006, VLDB.

[4]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[5]  Jean-François Boulicaut,et al.  Constraint-Based Mining of Formal Concepts in Transactional Data , 2004, PAKDD.

[6]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[7]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[8]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[9]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Association Rule Mining , 2007 .

[10]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[11]  Anthony K. H. Tung,et al.  COBBLER: combining column and row enumeration for closed pattern discovery , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[12]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[13]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[14]  Anthony K. H. Tung,et al.  Compressed Hierarchical Mining of Frequent Closed Patterns from Dense Data Sets , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Jean-François Boulicaut,et al.  Closed patterns meet n-ary relations , 2009, TKDD.

[16]  Don-Lin Yang,et al.  Efficient Mining of Frequent Closed Itemsets without Closure Checking , 2008, 2008 Eighth International Conference on Intelligent Systems Design and Applications.