MaxDomino: Efficiently Mining Maximal Sets

We present MaxDomino, an algorithm for mining maximal frequent sets using a novel concept of dominancy factor of a transaction. We also propose a hashing scheme to collapse the database to a form that contains only unique transactions. Unlike traditional bottom up approach with look-aheads, MaxDomino employs a top down strategy with selective bottom up search for mining maximal sets. Using the connect dataset [Benchmark dataset created by University California, Irvine], our experimental results reveal that MaxDomino outperforms GenMax at higher support levels. Furthermore, our scalability tests show that MaxDomino yields an order of magnitude improvement in speed over GenMax. MaxDomino is especially efficient when the maximal frequent sets are longer.

[1]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[2]  Ramesh C Agarwal,et al.  Depth first generation of long patterns , 2000, KDD '00.

[3]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.