A Tree Partitioning Method for Memory Management in Association Rule Mining

All methods of association rule mining require the frequent sets of items, that occur together sufficiently often to be the basis of potentially interesting rules, to be first computed. The cost of this increases in proportion to the databaseL size, and also with its density. Densely-populated databases can give rise to very large numbers of candidates that must be counted. Both these factors cause performance problems, especially when the data structures involved become too large for primary memory. In this paper we describe a method of partitioning that organises the data into tree structures that can be processed independently. We present experimental results that show the method scales well for increasing dimensions of data, and performs significantly better than alternatives, especially when dealing with dense data and low support thresholds.

[1]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[2]  Frans Coenen,et al.  Algorithms for computing association rules using a partial-support tree , 2000, Knowl. Based Syst..

[3]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[4]  Frans Coenen,et al.  Optimising Association Rule Algorithms Using Itemset Ordering , 2002 .

[5]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[6]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[7]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[8]  Ramesh C Agarwal,et al.  Depth first generation of long patterns , 2000, KDD '00.

[9]  Osmar R. Zaïane,et al.  Non-recursive Generation of Frequent K-itemsets from Frequent Pattern Tree Representations , 2003, DaWaK.

[10]  Osmar R. Zaïane,et al.  Incremental mining of frequent patterns without candidate generation or support constraint , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[11]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[12]  Frans Coenen,et al.  Computing Association Rules Using Partial Totals , 2001, PKDD.

[13]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[14]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 2004, Data Mining and Knowledge Discovery.

[15]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.