Accelerating frequent itemset mining on graphics processing units

In this paper we describe a new parallel Frequent Itemset Mining algorithm called “Frontier Expansion.” This implementation is optimized to achieve high performance on a heterogeneous platform consisting of a shared memory multiprocessor and multiple Graphics Processing Unit (GPU) coprocessors. Frontier Expansion is an improved data-parallel algorithm derived from the Equivalent Class Clustering (Eclat) method, in which a partial breadth-first search is utilized to exploit maximum parallelism while being constrained by the available memory capacity. In our approach, the vertical transaction lists are represented using a “bitset” representation and operated using wide bitwise operations across multiple threads on a GPU. We evaluate our approach using four NVIDIA Tesla GPUs and observed a 6–30× speedup relative to state-of-the-art sequential Eclat and FPGrowth implementations executed on a multicore CPU.

[1]  Bart Goethals,et al.  Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations (FIMI'03) , 2003 .

[2]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[3]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[4]  Eric Li,et al.  Optimization of Frequent Itemset Mining on Multiple-Core Processor , 2007, VLDB.

[5]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[6]  Nhien-An Le-Khac,et al.  Distributed Frequent Itemsets Mining in Heterogeneous Platforms , 2007 .

[7]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[8]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[9]  Bingsheng He,et al.  Frequent itemset mining on graphics processors , 2009, DaMoN '09.

[10]  Fan Zhang,et al.  GPApriori: GPU-Accelerated Frequent Itemset Mining , 2011, 2011 IEEE International Conference on Cluster Computing.

[11]  Bart Goethals,et al.  Advances in frequent itemset mining implementations: report on FIMI'03 , 2004, SKDD.

[12]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[13]  Christian Borgelt,et al.  Induction of Association Rules: Apriori Implementation , 2002, COMPSTAT.

[14]  Christian Borgelt,et al.  EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[15]  Mitica Craus,et al.  A New Parallel Algorithm for the Frequent Itemset Mining Problem , 2008, 2008 International Symposium on Parallel and Distributed Computing.

[16]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[17]  Raj P. Gopalan,et al.  Efficiently Mining Frequent Patterns from Dense Datasets Using a Cluster of Computers , 2003, Australian Conference on Artificial Intelligence.

[18]  Masaru Kitsuregawa,et al.  Parallel FP-Growth on PC Cluster , 2003, PAKDD.

[19]  H. Kaabi,et al.  Distributed Frequent Itemset Mining using Trie Data Structure , 2022 .

[20]  Fabrizio Silvestri,et al.  kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets , 2003, FIMI.

[21]  Walter A. Kosters,et al.  Apriori, A Depth First Implementation , 2003, FIMI.

[22]  Amos Fiat,et al.  AIM: Another Itemset Miner , 2003, FIMI.

[23]  Chia-Chu Chiang,et al.  A Parallel Apriori Algorithm for Frequent Itemsets Mining , 2006, Fourth International Conference on Software Engineering Research, Management and Applications (SERA'06).

[24]  Ferenc Bodon,et al.  A trie-based APRIORI implementation for mining frequent item sequences , 2005 .

[25]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).