An Algorithm for Mining Association Rules Using Perfect Hashing and Database Pruning

In this paper, we propose an algorithm for finding frequent itemsets in transaction databases. The basic idea of our algorithm is inspired from the Direct Hashing and Pruning (DHP) algorithm, which is in fact a variation of the well-known Apriori algorithm. In the DHP algorithm, a hash table is used in order to reduce the size of the candidate k+1 itemsets generated at each step. The difference of our algorithm is that, it uses perfect hashing in order to create a hash table for the candidate k+1 itemsets. As perfect hashing is used, the hash table contains the actual counts of the candidate k+1 itemsets. Hence we do not need to make extra processing to count the occurrences of candidate k+1 itemsets as in the DHP algorithm. The algorithm also prunes the database at each step in order to reduce the search space. We also tested our algorithm with real datasets obtained from a large retailing company and observed that our algorithm performs better than the Apriori algorithm.