Enhancing the Apriori Algorithm for Frequent Set Counting

In this paper we propose DCP, a new algorithm for solving the Frequent Set Counting problem, which enhances Apriori. Our goal was to optimize the initial iterations of Apriori, i.e. the most time consuming ones when datasets characterized by short or medium length frequent patterns are considered. The main improvements regard the use of an innovative method for storing candidate set of items and counting their support, and the exploitation of effective pruning techniques which significantly reduce the size of the dataset as execution progresses.

[1]  Srinivasan Parthasarathy,et al.  Evaluation of sampling for data mining of association rules , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[2]  Heikki Mannila,et al.  Fast Discovery of Association Rules in Large Databases , 1996, Knowledge Discovery and Data Mining.

[3]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Salvatore Orlando,et al.  Implementation Issues in the Design of I/O Intensive Data Mining Applications on Clusters of Workstations , 2000, IPDPS Workshops.

[6]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[7]  S. Orlando,et al.  The DCP algorithm for Frequent Set Counting , 2001 .

[8]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[9]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[10]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[11]  Vipin Kumar,et al.  Scalable Parallel Data Mining for Association Rules , 2000, IEEE Trans. Knowl. Data Eng..

[12]  Ramesh C Agarwal,et al.  Depth first generation of long patterns , 2000, KDD '00.

[13]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[14]  Andreas Mueller,et al.  Fast sequential and parallel algorithms for association rule mining: a comparison , 1995 .

[15]  Nandit Soparkar,et al.  Data organization and access for efficient data mining , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[16]  Johannes Gehrke,et al.  Mining Very Large Databases , 1999, Computer.

[17]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[18]  Jun-Lin Lin,et al.  Mining association rules: anti-skew algorithms , 1998, Proceedings 14th International Conference on Data Engineering.

[19]  Ananth Grama,et al.  Data Mining: From Serendipity to Science - Guest Editors' Introduction , 1999, Computer.

[20]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.