BitTableFI: An efficient mining frequent itemsets algorithm

Mining frequent itemsets in transaction databases, time-series databases and many other kinds of databases is an important task and has been studied popularly in data mining research. The problem of mining frequent itemsets can be solved by constructing a candidate set of itemsets first, and then, identifying those itemsets that meet the frequent itemset requirement within this candidate set. Most of the previous research mainly focuses on pruning to reduce the candidate itemsets amounts and the times of scanning databases. However, many algorithms adopt an Apriori-like candidate itemsets generation and support count approach that is the most time-wasted process. To address this issue, the paper proposes an effective algorithm named as BitTableFI. In the algorithm, a special data structure BitTable is used horizontally and vertically to compress database for quick candidate itemsets generation and support count, respectively. The algorithm can also be used in many Apriori-like algorithms to improve the performance. Experiments with both synthetic and real databases show that BitTableFI outperforms Apriori and CBAR which uses ClusterTable for quick support count.

[1]  Frans Coenen,et al.  Algorithms for computing association rules using a partial-support tree , 2000, Knowl. Based Syst..

[2]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[3]  Yuh-Jiuan Tsay,et al.  CBAR: an efficient method for mining association rules , 2005, Knowl. Based Syst..

[4]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  Kenneth McGarry,et al.  A survey of interestingness measures for knowledge discovery , 2005, The Knowledge Engineering Review.

[6]  Kate Smith-Miles,et al.  A Compress-Based Association Mining Algorithm for Large Dataset , 2003, International Conference on Computational Science.

[7]  Mohammed J. Zaki,et al.  GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets , 2005, Data Mining and Knowledge Discovery.

[8]  Mohammed J. Zaki Mining Non-Redundant Association Rules , 2004, Data Min. Knowl. Discov..

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Dino Pedreschi,et al.  Efficient breadth-first mining of frequent pattern with monotone constraints , 2005, Knowledge and Information Systems.

[11]  Soon Myoung Chung,et al.  Mining association rules using inverted hashing and pruning , 2002, Inf. Process. Lett..

[12]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[13]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  Sushil Jajodia,et al.  Discovering calendar-based temporal association rules , 2003 .

[15]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[16]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..