A New Data Layout for Set Intersection on GPUs

Set intersection is the core in a variety of problems, e.g. frequent item set mining and sparse boolean matrix multiplication. It is well-known that large speed gains can, for some computational problems, be obtained by using a graphics processing unit (GPU) as a massively parallel computing device. However, GPUs require highly regular control flow and memory access patterns, and for this reason previous GPU methods for intersecting sets have used a simple bitmap representation. This representation requires excessive space on sparse data sets. In this paper we present a novel data layout, "BatMap", that is particularly well suited for parallel processing, and is compact even for sparse data. Frequent item set mining is one of the most important applications of set intersection. As a case-study on the potential of BatMaps we focus on frequent pair mining, which is a core special case of frequent item set mining. The main finding is that our method is able to achieve speedups over both Apriori and FP-growth when the number of distinct items is large, and the density of the problem instance is above 0.01. Previous implementations of frequent item set mining on GPU have not been able to show speedups over the best single-threaded implementations.

[1]  Christian Borgelt Recursion Pruning for the Apriori Algorithm , 2004, FIMI.

[2]  Vitaly Osipov,et al.  GPU sample sort , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[3]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[4]  Eric Li,et al.  Optimization of Frequent Itemset Mining on Multiple-Core Processor , 2007, VLDB.

[5]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[6]  Uzi Vishkin,et al.  Simulation of Parallel Random Access Machines by Circuits , 1984, SIAM J. Comput..

[7]  Chia-Chu Chiang,et al.  A Parallel Apriori Algorithm for Frequent Itemsets Mining , 2006, Fourth International Conference on Software Engineering Research, Management and Applications (SERA'06).

[8]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[9]  Srinivasan Parthasarathy,et al.  Cache-conscious Frequent Pattern Mining on a Modern Processor , 2005, VLDB.

[10]  Arie Shoshani,et al.  On the performance of bitmap indices for high cardinality attributes , 2004, VLDB.

[11]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[12]  Bingsheng He,et al.  Relational joins on graphics processors , 2008, SIGMOD Conference.

[13]  Rasmus Pagh,et al.  Faster join-projects and sparse matrix multiplications , 2009, ICDT '09.

[14]  Balázs Rácz,et al.  nonordfp: An FP-growth variation without rebuilding the FP-tree , 2004, FIMI.

[15]  Christian Borgelt,et al.  EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[16]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[17]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[18]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[19]  Srinivasan Parthasarathy,et al.  Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[20]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[21]  Lars Schmidt-Thieme,et al.  On benchmarking frequent itemset mining algorithms: from measurement to analysis , 2005 .

[22]  Friedhelm Meyer auf der Heide,et al.  Simple, efficient shared memory simulations , 1993, SPAA '93.

[23]  Theodore Johnson,et al.  Performance Measurements of Compressed Bitmap Indices , 1999, VLDB.

[24]  Eli Upfal,et al.  How to share memory in a distributed system , 1984, JACM.

[25]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[26]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  Bingsheng He,et al.  Frequent itemset mining on graphics processors , 2009, DaMoN '09.

[28]  Philip Bille,et al.  Fast Evaluation of Union-Intersection Expressions , 2007, ISAAC.

[29]  Erik D. Demaine,et al.  Adaptive set intersections, unions, and differences , 2000, SODA '00.

[30]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.