CUDA Based Parallel Implementations of Space-Saving on a GPU

We present four CUDA based parallel implementations of the Space-Saving algorithm for determining frequent items on a GPU. The first variant exploits the open-source CUB library to simplify the implementation of a user's defined reduction, whilst the second is based on our own implementation of the parallel reduction. The third and the fourth, built on the previous variants, are meant to improve the performance by taking advantage of hardware based atomic instructions. In particular, we implement a warp based ballot mechanism to accelerate the Space-Saving updates. We show that our implementation of the parallel reduction, coupled with the ballot based update mechanism, is the fastest, and provides extensive experimental results regarding its performance.

[1]  Ugo Erra,et al.  Frequent Items Mining Acceleration Exploiting Fast Parallel Sorting on the GPU , 2012, ICCS.

[2]  Yongzheng Zhang,et al.  Parallelizing weighted frequency counting in high-speed network monitoring , 2011, Comput. Commun..

[3]  Marco Pulimeno,et al.  A parallel space saving algorithm for frequent items and the Hurwitz zeta distribution , 2014, Inf. Sci..

[4]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[5]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[6]  Shyam Antony,et al.  Thread Cooperation in Multicore Architectures for Frequency Counting over Multiple Data Streams , 2009, Proc. VLDB Endow..

[7]  Yu Zhang,et al.  Parallelizing the Weighted Lossy Counting Algorithm in High-speed Network Monitoring , 2012, 2012 Second International Conference on Instrumentation, Measurement, Computer, Communication and Control.

[8]  Yu Zhang,et al.  An efficient framework for parallel and continuous frequent item monitoring , 2014, Concurr. Comput. Pract. Exp..

[9]  Marios Hadjieleftheriou,et al.  Methods for finding frequent items in data streams , 2010, The VLDB Journal.

[10]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[11]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[12]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[13]  Gustavo Alonso,et al.  Efficient frequent item counting in multi-core hardware , 2012, KDD.

[14]  Marco Pulimeno,et al.  Parallel space saving on multi‐ and many‐core processors , 2016, Concurr. Comput. Pract. Exp..

[15]  Dinesh Manocha,et al.  Fast and approximate stream mining of quantiles and frequencies using graphics processors , 2005, SIGMOD '05.

[16]  Massimo Cafaro,et al.  Finding frequent items in parallel , 2011, Concurr. Comput. Pract. Exp..