Association Rules Mining Using Heavy Itemsets

A well-known problem that limits the practical usage of association rule mining algorithms is the extremely large number of rules generated. Such a large number of rules makes the algorithms inefficient and makes it difficult for the end users to comprehend the discovered rules. We present the concept of a heavy itemset. An itemset A is heavy (for given support and confidence values) if all possible association rules made up of items only in A are present. We prove a simple necessary and sufficient condition for an itemset to be heavy. We present a formula for the number of possible rules for a given heavy itemset, and show that a heavy itemset compactly represents an exponential number of association rules. Along with two simple search algorithms, we present an efficient greedy algorithm to generate a collection of disjoint heavy itemsets in a given transaction database. We then present a modified apriori algorithm that starts with a given collection of disjoint heavy itemsets and discovers more heavy itemsets, not necessarily disjoint with the given ones.

[1]  M ApteManoj,et al.  Association rules mining using heavy itemsets , 2007 .

[2]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[3]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[6]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[7]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[8]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[9]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[10]  Zvi M. Kedem,et al.  Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set , 2002, IEEE Trans. Knowl. Data Eng..

[11]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[12]  Sridhar Ramaswamy,et al.  On the Discovery of Interesting Patterns in Association Rules , 1998, VLDB.

[13]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[14]  Mohammed J. Zaki Generating non-redundant association rules , 2000, KDD '00.

[15]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[16]  S YuPhilip,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997 .

[17]  A. K. Pujari,et al.  Data Mining Techniques , 2006 .