Efficient frequent item counting in multi-core hardware

The increasing number of cores and the rich instruction sets of modern hardware are opening up new opportunities for optimizing many traditional data mining tasks. In this paper we demonstrate how to speed up the performance of the computation of frequent items by almost one order of magnitude over the best published results by matching the algorithm to the underlying hardware architecture. We start with the observation that frequent item counting, like other data mining tasks, assumes certain amount of skew in the data. We exploit this skew to design a new algorithm that uses a pre-filtering stage that can be implemented in a highly efficient manner through SIMD instructions. Using pipelining, we then combine this pre-filtering stage with a conventional frequent item algorithm (Space-Saving) that will process the remainder of the data. The resulting operator can be parallelized with a small number of cores, leading to a parallel implementation that does not suffer any of the overheads of existing parallel solutions when querying the results and offers significantly higher throughput.

[1]  Alexander Zeier,et al.  SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units , 2009, Proc. VLDB Endow..

[2]  Jeffrey F. Naughton,et al.  Cache Conscious Algorithms for Relational Query Processing , 1994, VLDB.

[3]  H. T. Kung,et al.  Systolic (VLSI) arrays for relational database operations , 1980, SIGMOD '80.

[4]  Wolfgang Lehner,et al.  Fast Sorted-Set Intersection using SIMD Instructions , 2011, ADMS@VLDB.

[5]  Geoffrey C. Fox,et al.  Parallel Data Mining on Multicore Clusters , 2008, 2008 Seventh International Conference on Grid and Cooperative Computing.

[6]  Massimo Cafaro,et al.  Finding frequent items in parallel , 2011, Concurr. Comput. Pract. Exp..

[7]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[8]  Gustavo Alonso,et al.  FPGA acceleration for the frequent item problem , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[9]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[10]  Martin L. Kersten,et al.  Optimizing Main-Memory Join on Modern Hardware , 2002, IEEE Trans. Knowl. Data Eng..

[11]  Themis Palpanas,et al.  Frequent items in streaming data: An experimental evaluation of the state-of-the-art , 2009, Data Knowl. Eng..

[12]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Jignesh M. Patel,et al.  Design and evaluation of main memory hash join algorithms for multi-core CPUs , 2011, SIGMOD '11.

[14]  Henk M. Blanken,et al.  Control Versus Data Flow in Parallel Database Machines , 1993, IEEE Trans. Parallel Distributed Syst..

[15]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[16]  Pradeep Dubey,et al.  Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs , 2009, Proc. VLDB Endow..

[17]  Graham Cormode,et al.  Summarizing and Mining Skewed Data Streams , 2005, SDM.

[18]  Shyam Antony,et al.  Thread Cooperation in Multicore Architectures for Frequency Counting over Multiple Data Streams , 2009, Proc. VLDB Endow..

[19]  Jens Teubner,et al.  How soccer players would do stream joins , 2011, SIGMOD '11.

[20]  Divyakant Agrawal,et al.  An integrated efficient solution for computing frequent and top-k elements in data streams , 2006, TODS.

[21]  Kenneth A. Ross,et al.  Scalable aggregation on multicore processors , 2011, DaMoN '11.

[22]  H. T. Kung,et al.  Systolic Arrays for (VLSI). , 1978 .

[23]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[24]  John Giacomoni,et al.  FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue , 2008, PPoPP.

[25]  Matthias S. Müller,et al.  Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.