Efficient Frequent Itemsets Mining by Sampling

As the first stage for discovering association rules, frequent itemsets mining is an important challenging task for large databases. Sampling provides an efficient way to get approximating answers in much shorter time. Based on the characteristics of frequent itemsets counting, a new bound for sampling is proposed, with which less samples are necessary to achieve the required accuracy and the efficiency is much improved over traditional Chernoff bounds.

[1]  Harald Niederreiter,et al.  Probability and computing: randomized algorithms and probabilistic analysis , 2006, Math. Comput..

[2]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[3]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[4]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[5]  Srinivasan Parthasarathy,et al.  Evaluation of sampling for data mining of association rules , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[6]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[7]  Geert Wets,et al.  Using association rules for product assortment decisions: a case study , 1999, KDD '99.

[8]  Geoffrey I. Webb,et al.  Identifying Approximate Itemsets of Interest in Large Databases , 2004, Applied Intelligence.

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Ferenc Bodon,et al.  A fast APRIORI implementation , 2003, FIMI.

[11]  Bin Chen,et al.  A new two-phase sampling based algorithm for discovering association rules , 2002, KDD.

[12]  Srinivasan Parthasarathy,et al.  Efficient progressive sampling for association rules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..