An E ective Hash-Based Algorithm for Mining Association RulesJong

In this paper we examine the issue of mining association rules among items in a large database of sales transactions The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a su cient number of transactions The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets rst and then identifying within this candidate set those itemsets that meet the large itemset requirement Generally this is done iteratively for each large k itemset in increasing order of k where a large k itemset is a large itemset with k items To determine large itemsets from a huge number of candidate large itemsets in early iterations is usually the dominating factor for the overall data mining performance To address this issue we propose an e ective hash based algorithm for the candidate set generation Explicitly the number of candidate itemsets generated by the proposed algorithm is in orders of magnitude smaller than that by previous methods thus resolving the performance bottleneck Note that the generation of smaller candidate sets enables us to e ectively trim the transaction database size at a much earlier stage of the iterations thereby reducing the computational cost for later iterations signi cantly Extensive simulation study is conducted to evaluate performance of the proposed algorithm