论文信息 - An improvement for dEclat algorithm

An improvement for dEclat algorithm

The diffset format (the difference of two sets) has drastically reduced the running time and memory usage of the Eclat algorithm and the Eclat algorithm using diffset format is called dEclat algorithm. However, in some sparse datasets, diffset format loses its advantage over tidset format (set of transaction IDs) and in this case it is suggested to use tidset format at starting and then switch to diffset format later. In this paper, we present a novel approach, combination of tidset and diffset, which uses both tidset and diffset format to represent transaction databases in frequent itemset mining. This approach can fully exploit the advantages of both tidset and diffset. Furthermore it does not require conversion of tidsets to diffset format. Preliminary results show that Eclat using this combination approach used less memory and was faster than dEclat in most datasets. We also introduce an improvement for dEclat algorithm, by sorting diffsets and tidsets the memory usage and running time of dEclat could be reduced significantly. A category with the (minimum) three required fields

Yoshitoshi Kunieda | Tuan A. Trieu | Y. Kunieda | Tuan Trieu

[1] Devavrat Shah,et al. Turbo-charging vertical mining of large databases , 2000, SIGMOD 2000.

[2] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[4] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[5] Devavrat Shah,et al. Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[6] Rakesh Agarwal,et al. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[7] Srinivasan Parthasarathy,et al. New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[8] Mohammed J. Zaki,et al. Fast vertical mining using diffsets , 2003, KDD '03.