Mining maximal hyperclique pattern: A hybrid search strategy

A hyperclique pattern is a new type of association pattern that contains items which are highly affiliated with each other. Specifically, the presence of an item in one transaction strongly implies the presence of every other item that belongs to the same hyperclique pattern. In this paper, we present an algorithm for mining maximal hyperclique patterns, which specifies a more compact representation of hyperclique patterns and are desirable for many applications, such as pattern-based clustering. Our algorithm exploits key advantages of both the Depth First Search (DFS) strategy and the Breadth First Search (BFS) strategy. Indeed, we adapt the equivalence pruning method, one of the most efficient pruning methods of the DFS strategy, into the process of the BFS strategy. Our experimental results show that the performance of our algorithm can be orders of magnitude faster than standard maximal frequent pattern mining algorithms, particularly at low levels of support.

[1]  Hui Xiong,et al.  Mining strong affinity association patterns in data sets with skewed support distribution , 2003, Third IEEE International Conference on Data Mining.

[2]  Laks V. S. Lakshmanan,et al.  Efficient mining of constrained correlated sets , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[3]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[4]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[5]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[6]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[7]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[8]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[9]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[10]  Li Shen,et al.  New Algorithms for Efficient Mining of Association Rules , 1999, Inf. Sci..

[11]  Hui Xiong,et al.  Identification of Functional Modules in Protein Complexes via Hyperclique Pattern Discovery , 2004, Pacific Symposium on Biocomputing.

[12]  Ron Rymon,et al.  Search through Systematic Set Enumeration , 1992, KR.

[13]  Xuemin Lin,et al.  Applying On-line Bitmap Indexing to Reduce Counting Costs in Mining Association Rules , 1999, Inf. Sci..

[14]  Ke Wang,et al.  Mining confident rules without support requirement , 2001, CIKM '01.

[15]  Edith Cohen,et al.  Finding Interesting Associations without Support Pruning , 2001, IEEE Trans. Knowl. Data Eng..

[16]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[17]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[18]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[19]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[20]  Hui Xiong,et al.  HICAP: Hierarchical Clustering with Pattern Preservation , 2004, SDM.