Memory-Aware Frequent k-Itemset Mining

In this paper we show that the well known problem of computing frequent k-itemsets (i.e. itemsets of cardinality k) in a given dataset can be reduced to the problem of finding iceberg queries from a stream of queries suitably constructed from the original dataset. Hence, algorithms for computing frequent k-itemsets can be obtained by adapting algorithms for computing iceberg queries. In the paper we show that, for sparse datasets, this can be done directly, i.e. without generating frequent x-itemsets, for each x 3). An important feature of the algorithm is that the amount of main memory required can be determined in advance, and it is shown to be very low for sparse datasets. Experiments show that for very large datasets with millions of small transactions our proposal outperforms the state-of-the-art algorithms. Furthermore, we sketch a first extension of our algorithm that works over data streams.

[1]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[2]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[3]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[4]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[5]  Vipin Kumar,et al.  Clustering Based On Association Rule Hypergraphs , 1997, DMKD.

[6]  Dino Pedreschi,et al.  ExAMiner: optimized level-wise frequent pattern mining with monotone constraints , 2003, Third IEEE International Conference on Data Mining.

[7]  Bart Goethals,et al.  Survey on Frequent Pattern Mining , 2003 .

[8]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[9]  Bin Chen,et al.  A new two-phase sampling based algorithm for discovering association rules , 2002, KDD.

[10]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[11]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[12]  Christian Borgelt,et al.  Keeping things simple: finding frequent item sets by recursive elimination , 2005 .

[13]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[14]  Bin Chen,et al.  FAST: a new sampling-based algorithm for discovering association rules , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[16]  Bart Goethals,et al.  Memory issues in frequent itemset mining , 2004, SAC '04.

[17]  Philip A. Bernstein,et al.  Proceedings of the 2000 ACM SIGMOD : International Conference on Management of Data, May 16-18, 2000, Dallas, Texas , 2000 .