Parallel Mining of Correlated Heavy Hitters on Distributed and Shared-Memory Architectures

We present parallel algorithms for mining Correlated Heavy Hitters from a two-dimensional data stream. In particular, we design and implement a message-passing, a shared-memory and a hybrid algorithm. To the best of our knowledge, these are the first parallel algorithms solving the problem. We show, through experimental results, that our algorithms provide very good scalability, whilst retaining the accuracy of their sequential counterpart.

[1]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[2]  Ling Chen,et al.  Mining frequent items in data stream using time fading model , 2014, Inf. Sci..

[3]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[4]  Robert S. Boyer,et al.  MJRTY: A Fast Majority Vote Algorithm , 1991, Automated Reasoning: Essays in Honor of Woody Bledsoe.

[5]  Ugo Erra,et al.  Frequent Items Mining Acceleration Exploiting Fast Parallel Sorting on the GPU , 2012, ICCS.

[6]  Marco Pulimeno,et al.  Parallel mining of time-faded heavy hitters , 2017, Expert Syst. Appl..

[7]  Marco Pulimeno,et al.  CUDA Based Parallel Implementations of Space-Saving on a GPU , 2017, 2017 International Conference on High Performance Computing & Simulation (HPCS).

[8]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[9]  Bibudh Lahiri,et al.  Identifying correlated heavy-hitters in a two-dimensional data stream , 2013, Data Mining and Knowledge Discovery.

[10]  Marco Pulimeno,et al.  Mining frequent items in the time fading model , 2016, Inf. Sci..

[11]  Marco Pulimeno,et al.  On Frequency Estimation and Detection of Frequent Items in Time Faded Streams , 2017, IEEE Access.

[12]  Marco Pulimeno,et al.  Merging Frequent Summaries , 2016, ICTCS.

[13]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[14]  Massimo Cafaro,et al.  Fast and accurate mining of correlated heavy hitters , 2016, Data Mining and Knowledge Discovery.

[15]  Yu Zhang,et al.  An efficient framework for parallel and continuous frequent item monitoring , 2014, Concurr. Comput. Pract. Exp..

[16]  Divyakant Agrawal,et al.  An integrated efficient solution for computing frequent and top-k elements in data streams , 2006, TODS.

[17]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[18]  Marco Pulimeno,et al.  Parallel space saving on multi‐ and many‐core processors , 2016, Concurr. Comput. Pract. Exp..

[19]  Kun-Lung Wu,et al.  Parallel streaming frequency-based aggregates , 2014, SPAA.

[20]  Catiuscia Melle,et al.  Parallel Mining of Correlated Heavy Hitters , 2018, ICCSA.

[21]  Shuang Ren,et al.  An outlier mining-based malicious node detection model for hybrid P2P networks , 2016, Comput. Networks.

[22]  Gustavo Alonso,et al.  Efficient frequent item counting in multi-core hardware , 2012, KDD.

[23]  Marco Pulimeno,et al.  A parallel space saving algorithm for frequent items and the Hurwitz zeta distribution , 2014, Inf. Sci..

[24]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[25]  Massimo Cafaro,et al.  Finding frequent items in parallel , 2011, Concurr. Comput. Pract. Exp..