GMiner: A fast GPU-based frequent itemset mining method for large-scale data

Abstract Frequent itemset mining is widely used as a fundamental data mining technique. However, as the data size increases, the relatively slow performances of the existing methods hinder its applicability. Although many sequential frequent itemset mining methods have been proposed, there is a clear limit to the performance that can be achieved using a single thread. To overcome this limitation, various parallel methods using multi-core CPU, multiple machine, or many-core graphic processing unit (GPU) approaches have been proposed. However, these methods still have drawbacks, including relatively slow performance, data size limitations, and poor scalability due to workload skewness. In this paper, we propose a fast GPU-based frequent itemset mining method called GMiner for large-scale data. GMiner achieves very fast performance by fully exploiting the computational power of GPUs and is suitable for large-scale data. The method performs mining tasks in a counterintuitive way: it mines the patterns from the first level of the enumeration tree rather than storing and utilizing the patterns at the intermediate levels of the tree. This approach is quite effective in terms of both performance and memory use in the GPU architecture. In addition, GMiner solves the workload skewness problem from which the existing parallel methods suffer; as a result, its performance increases almost linearly as the number of GPUs increases. Through extensive experiments, we demonstrate that GMiner significantly outperforms other representative sequential and parallel methods in most cases, by orders of magnitude on the tested benchmarks.

[1]  Yanjun Qi,et al.  Association Rule Mining with the Micron Automata Processor , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[2]  Wolfgang Lehner,et al.  Scalable frequent itemset mining on many-core processors , 2013, DaMoN '13.

[3]  Salvatore Orlando,et al.  gpuDCI: Exploiting GPUs in Frequent Itemset Mining , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[4]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[5]  Wagner Meira,et al.  Tree Projection-Based Frequent Itemset Mining on Multicore CPUs and GPUs , 2010, 2010 22nd International Symposium on Computer Architecture and High Performance Computing.

[6]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[7]  Christian Borgelt,et al.  EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[8]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[9]  Ming-Yen Lin,et al.  Apriori-based frequent itemset mining algorithms on MapReduce , 2012, ICUIMC.

[10]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[11]  Philippe Fournier-Viger,et al.  A survey of itemset mining , 2017, WIREs Data Mining Knowl. Discov..

[12]  Jinwook Kim,et al.  GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs , 2016, SIGMOD Conference.

[13]  Sebastian Zalewski,et al.  Mining Frequent Intra- and Inter-Transaction Itemsets on Multi-Core Processors , 2015 .

[14]  Jiayi Zhou,et al.  Parallel frequent patterns mining algorithm on GPU , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[15]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-memory Systems , 1998 .

[16]  Wolfgang Lehner,et al.  pcApriori: scalable apriori for multiprocessor systems , 2013, SSDBM.

[17]  Bingsheng He,et al.  Frequent itemset mining on graphics processors , 2009, DaMoN '09.

[18]  Ching-Hsien Hsu,et al.  Accelerating Parallel Frequent Itemset Mining on Graphics Processors with Sorting , 2013, NPC.

[19]  Elena Baralis,et al.  Scalable out-of-core itemset mining , 2015, Inf. Sci..

[20]  Lan Vu,et al.  Novel parallel method for mining frequent patterns on multi-core shared memory systems , 2013, DISCS-2013.

[21]  Fabrizio Silvestri,et al.  WebDocs: a real-life huge transactional dataset , 2004, FIMI.

[22]  Fan Zhang,et al.  Accelerating frequent itemset mining on graphics processing units , 2013, The Journal of Supercomputing.

[23]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[24]  Fabrizio Silvestri,et al.  kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets , 2003, FIMI.

[25]  Bart Goethals,et al.  Frequent Itemset Mining for Big Data , 2013, 2013 IEEE International Conference on Big Data.

[26]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[27]  Fei Li,et al.  A distributed frequent itemset mining algorithm based on Spark , 2015, 2015 IEEE 19th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[28]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[29]  Fan Zhang,et al.  GPApriori: GPU-Accelerated Frequent Itemset Mining , 2011, 2011 IEEE International Conference on Cluster Computing.

[30]  Jiayi Zhou,et al.  An OpenCL Candidate Slicing Frequent Pattern Mining algorithm on graphic processing units , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[31]  Eric Li,et al.  Optimization of Frequent Itemset Mining on Multiple-Core Processor , 2007, VLDB.

[32]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.