Parallel Mining of Correlated Heavy Hitters

We present a message-passing based parallel algorithm for mining Correlated Heavy Hitters from a two-dimensional data stream. To the best of our knowledge, this is the first parallel algorithm solving the problem. We show, through experimental results, that our algorithm provides very good scalability, whilst retaining the accuracy of its sequential counterpart.

[1]  Marco Pulimeno,et al.  Parallel mining of time-faded heavy hitters , 2017, Expert Syst. Appl..

[2]  Ling Chen,et al.  Mining frequent items in data stream using time fading model , 2014, Inf. Sci..

[3]  Yunjun Gao,et al.  Novel structures for counting frequent items in time decayed streams , 2017, World Wide Web.

[4]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[5]  Kun-Lung Wu,et al.  Parallel streaming frequency-based aggregates , 2014, SPAA.

[6]  Yu Zhang,et al.  Parallelizing the Weighted Lossy Counting Algorithm in High-speed Network Monitoring , 2012, 2012 Second International Conference on Instrumentation, Measurement, Computer, Communication and Control.

[7]  Yu Zhang,et al.  An efficient framework for parallel and continuous frequent item monitoring , 2014, Concurr. Comput. Pract. Exp..

[8]  Divyakant Agrawal,et al.  An integrated efficient solution for computing frequent and top-k elements in data streams , 2006, TODS.

[9]  Massimo Cafaro,et al.  Fast and accurate mining of correlated heavy hitters , 2016, Data Mining and Knowledge Discovery.

[10]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[11]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[12]  Marco Pulimeno,et al.  Parallel space saving on multi‐ and many‐core processors , 2016, Concurr. Comput. Pract. Exp..

[13]  Gustavo Alonso,et al.  Efficient frequent item counting in multi-core hardware , 2012, KDD.

[14]  Robert S. Boyer,et al.  MJRTY: A Fast Majority Vote Algorithm , 1991, Automated Reasoning: Essays in Honor of Woody Bledsoe.

[15]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[16]  Shyam Antony,et al.  Thread Cooperation in Multicore Architectures for Frequency Counting over Multiple Data Streams , 2009, Proc. VLDB Endow..

[17]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[18]  Marco Pulimeno,et al.  Mining frequent items in the time fading model , 2016, Inf. Sci..

[19]  Marco Pulimeno,et al.  On Frequency Estimation and Detection of Frequent Items in Time Faded Streams , 2017, IEEE Access.

[20]  Marco Pulimeno,et al.  Merging Frequent Summaries , 2016, ICTCS.

[21]  Marco Pulimeno,et al.  A parallel space saving algorithm for frequent items and the Hurwitz zeta distribution , 2014, Inf. Sci..

[22]  Marco Pulimeno,et al.  CUDA Based Parallel Implementations of Space-Saving on a GPU , 2017, 2017 International Conference on High Performance Computing & Simulation (HPCS).

[23]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[24]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[25]  Dinesh Manocha,et al.  Fast and approximate stream mining of quantiles and frequencies using graphics processors , 2005, SIGMOD '05.

[26]  Massimo Cafaro,et al.  Finding frequent items in parallel , 2011, Concurr. Comput. Pract. Exp..

[27]  Ugo Erra,et al.  Frequent Items Mining Acceleration Exploiting Fast Parallel Sorting on the GPU , 2012, ICCS.

[28]  Bibudh Lahiri,et al.  Identifying correlated heavy-hitters in a two-dimensional data stream , 2013, Data Mining and Knowledge Discovery.

[29]  Aoying Zhou,et al.  Dynamically maintaining frequent items over a data stream , 2003, CIKM '03.