Single window stream aggregation using reconfigurable hardware

High throughput and low latency stream aggregation — and stream processing in general — is critical for many emerging applications that analyze massive volumes of continuously produced data on-the-fly, to make real time decisions. In many cases, high speed stream aggregation can be achieved incrementally by computing partial results for multiple windows. However, for particular problems, storing all incoming raw data to a single window before processing is more efficient or even the only option. This paper presents the first FPGA-based single window stream aggregation design. Using Maxeler's dataflow engines (DFEs), up to 8 million tuples-per-second can be processed (1.1 Gbps) offering 1–2 orders of magnitude higher throughput than a state-of-the-art stream processing software system. DFEs have a direct feed of incoming data from the network as well as direct access to off-chip DRAM processing a tuple in less than 4 μsec, 4 orders of magnitude lower latency than software. The proposed approach is able to support challenging queries required in realistic stream processing problems (e.g. holistic functions). Our design offers aggregation for up to 1 million concurrently active keys and handles large windows storing up to 6144 values (24 KB) per key.

[1]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[2]  David Maier,et al.  No pane, no gain: efficient evaluation of sliding-window aggregates over data streams , 2005, SGMD.

[3]  Marina Papatriantafilou,et al.  Deterministic real-time analytics of geospatial data streams through ScaleGate objects , 2015, DEBS.

[4]  Vincenzo Gulisano,et al.  The DEBS 2017 Grand Challenge , 2017, DEBS.

[5]  Gustavo Alonso,et al.  Streams on Wires - A Query Compiler for FPGAs , 2009, Proc. VLDB Endow..

[6]  Gustavo Alonso,et al.  A Hash Table for Line-Rate Data Processing , 2015, TRETS.

[7]  Hideyuki Kawashima,et al.  An Efficient and Scalable Implementation of Sliding-Window Aggregate Operator on FPGA , 2013, 2013 First International Symposium on Computing and Networking.

[8]  Gustavo Alonso,et al.  Caribou: Intelligent Distributed Storage , 2017, Proc. VLDB Endow..

[9]  Alexander L. Wolf,et al.  SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures , 2016, SIGMOD Conference.

[10]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[11]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[12]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[13]  Vincenzo Gulisano,et al.  The DEBS 2016 grand challenge , 2016, DEBS.

[14]  Saso Tomazic,et al.  Sorting Networks on Maxeler Dataflow Supercomputing Systems , 2015, Adv. Comput..

[15]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[16]  Vassilis J. Tsotras,et al.  FPGA-accelerated group-by aggregation using synchronizing caches , 2016, DaMoN '16.

[17]  Kaiwen Zhang,et al.  Hardware Acceleration Landscape for Distributed Real-Time Analytics: Virtues and Limitations , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).