Concurrent data structures for efficient streaming aggregation

We briefly describe our study on the problem of streaming multiway aggregation, where large data volumes are received from multiple input streams. Multiway aggregation is a fundamental computational component in data stream management systems, requiring low-latency and high throughput solutions.We focus on the problem of designing concurrent data structures enabling for low-latency and high-throughput multiway aggregation; an issue that has been overlooked in the literature. We propose two new concurrent data structures and their lock-free linearizable implementations, supporting both order-sensitive and order-insensitive aggregate functions.Results from an extensive evaluation show significant improvement in the aggregation performance,in terms of both processing throughput and latency over the commonly-used techniques based on queues.

[1]  Maged M. Michael,et al.  High performance dynamic lock-free hash tables and list-based sets , 2002, SPAA '02.

[2]  Kun-Lung Wu,et al.  Elastic scaling of data parallel operators in stream processing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[3]  Maged M. Michael The balancing act of choosing nonblocking features , 2013, CACM.

[4]  Angelos Bilas,et al.  Understanding and improving the cost of scaling distributed event processing , 2012, DEBS.

[5]  Claudio Soriente,et al.  StreamCloud: An Elastic and Scalable Data Streaming System , 2012, IEEE Transactions on Parallel and Distributed Systems.

[6]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[7]  Marina Papatriantafilou,et al.  No . 2013 : 11 Concurrent Data Structures for Efficient Streaming Aggregation , 2014 .

[8]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[9]  M. Tamer Özsu,et al.  Adaptive input admission and management for parallel stream processing , 2013, DEBS.

[10]  Michael Stonebraker,et al.  Aurora: a new model and architecture for data stream management , 2003, The VLDB Journal.

[11]  Alessandro Margara,et al.  Low latency complex event processing on parallel hardware , 2012, J. Parallel Distributed Comput..

[12]  Taskin Koçak,et al.  Smart Grid Technologies: Communication Technologies and Standards , 2011, IEEE Transactions on Industrial Informatics.

[13]  Mohamed A. Sharaf,et al.  Three-Level Processing of Multiple Aggregate Continuous Queries , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[14]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[15]  Kun-Lung Wu,et al.  Evaluation of streaming aggregation on parallel hardware architectures , 2010, DEBS '10.

[16]  Jennifer Widom,et al.  Flexible time management in data stream systems , 2004, PODS.

[17]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[18]  Michael Stonebraker,et al.  Fault-tolerance in the Borealis distributed stream processing system , 2005, SIGMOD '05.

[19]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[20]  Philip S. Yu,et al.  CellJoin: a parallel stream join operator for the cell processor , 2009, The VLDB Journal.

[21]  Tim Kraska,et al.  Stormy: an elastic and highly available streaming service in the cloud , 2012, EDBT-ICDT '12.

[22]  Philippas Tsigas,et al.  Fast and lock-free concurrent priority queues for multi-thread systems , 2005, J. Parallel Distributed Comput..