论文信息 - Accelerating frequent itemset mining on graphics processing units

Accelerating frequent itemset mining on graphics processing units

In this paper we describe a new parallel Frequent Itemset Mining algorithm called “Frontier Expansion.” This implementation is optimized to achieve high performance on a heterogeneous platform consisting of a shared memory multiprocessor and multiple Graphics Processing Unit (GPU) coprocessors. Frontier Expansion is an improved data-parallel algorithm derived from the Equivalent Class Clustering (Eclat) method, in which a partial breadth-first search is utilized to exploit maximum parallelism while being constrained by the available memory capacity. In our approach, the vertical transaction lists are represented using a “bitset” representation and operated using wide bitwise operations across multiple threads on a GPU. We evaluate our approach using four NVIDIA Tesla GPUs and observed a 6–30× speedup relative to state-of-the-art sequential Eclat and FPGrowth implementations executed on a multicore CPU.

Fan Zhang | Yan Zhang | Jason D. Bakos

[1] Bart Goethals,et al. Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations (FIMI'03) , 2003 .

[2] Mohammed J. Zaki,et al. Fast vertical mining using diffsets , 2003, KDD '03.

[3] Johannes Gehrke,et al. MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[4] Eric Li,et al. Optimization of Frequent Itemset Mining on Multiple-Core Processor , 2007, VLDB.

[5] Rakesh Agrawal,et al. Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[6] Nhien-An Le-Khac,et al. Distributed Frequent Itemsets Mining in Heterogeneous Platforms , 2007 .

[7] Srinivasan Parthasarathy,et al. New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[8] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[9] Bingsheng He,et al. Frequent itemset mining on graphics processors , 2009, DaMoN '09.

[10] Fan Zhang,et al. GPApriori: GPU-Accelerated Frequent Itemset Mining , 2011, 2011 IEEE International Conference on Cluster Computing.

[11] Bart Goethals,et al. Advances in frequent itemset mining implementations: report on FIMI'03 , 2004, SKDD.

[12] Srinivasan Parthasarathy,et al. Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[13] Christian Borgelt,et al. Induction of Association Rules: Apriori Implementation , 2002, COMPSTAT.

[14] Christian Borgelt,et al. EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[15] Mitica Craus,et al. A New Parallel Algorithm for the Frequent Itemset Mining Problem , 2008, 2008 International Symposium on Parallel and Distributed Computing.

[16] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.

[17] Raj P. Gopalan,et al. Efficiently Mining Frequent Patterns from Dense Datasets Using a Cluster of Computers , 2003, Australian Conference on Artificial Intelligence.

[18] Masaru Kitsuregawa,et al. Parallel FP-Growth on PC Cluster , 2003, PAKDD.

[19] H. Kaabi,et al. Distributed Frequent Itemset Mining using Trie Data Structure , 2022 .

[20] Fabrizio Silvestri,et al. kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets , 2003, FIMI.

[21] Walter A. Kosters,et al. Apriori, A Depth First Implementation , 2003, FIMI.

[22] Amos Fiat,et al. AIM: Another Itemset Miner , 2003, FIMI.

[23] Chia-Chu Chiang,et al. A Parallel Apriori Algorithm for Frequent Itemsets Mining , 2006, Fourth International Conference on Software Engineering Research, Management and Applications (SERA'06).

[24] Ferenc Bodon,et al. A trie-based APRIORI implementation for mining frequent item sequences , 2005 .

[25] Jian Pei,et al. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).