Low-Latency Handshake Join

This work revisits the processing of stream joins on modern hardware architectures. Our work is based on the recently proposed handshake join algorithm, which is a mechanism to parallelize the processing of stream joins in a NUMA-aware and hardware-friendly manner. Handshake join achieves high throughput and scalability, but it suffers from a high latency penalty and a non-deterministic ordering of the tuples in the physical result stream. In this paper, we first characterize the latency behavior of the handshake join and then propose a new low-latency handshake join algorithm, which substantially reduces latency without sacrificing throughput or scalability. We also present a technique to generate punctuated result streams with very little overhead; such punctuations allow the generation of correctly ordered physical output streams with negligible effect on overall throughput and latency.

[1]  Walid G. Aref,et al.  Hash-merge join: a non-blocking join algorithm for producing fast and early join results , 2004, Proceedings. 20th International Conference on Data Engineering.

[2]  Wolfgang Lehner,et al.  Stream Join Processing on Heterogeneous Processors , 2013, BTW Workshops.

[3]  Jeffrey F. Naughton,et al.  Evaluating window joins over unbounded streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  Gustavo Alonso,et al.  Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[5]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[6]  Philip S. Yu,et al.  CellJoin: a parallel stream join operator for the cell processor , 2009, The VLDB Journal.

[7]  Jeffrey Davis,et al.  Continuous analytics over discontinuous streams , 2010, SIGMOD Conference.

[8]  Theodore Johnson,et al.  Out-of-order processing: a new architecture for high-performance stream systems , 2008, Proc. VLDB Endow..

[9]  David Maier,et al.  Semantics and evaluation techniques for window aggregates in data streams , 2005, SIGMOD '05.

[10]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[11]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[12]  Jens Teubner,et al.  How soccer players would do stream joins , 2011, SIGMOD '11.

[13]  Gustavo Alonso,et al.  Complex event detection at wire speed with FPGAs , 2010, Proc. VLDB Endow..

[14]  Wolfgang Lehner,et al.  The HELLS-join: a heterogeneous stream join for extremely large windows , 2013, DaMoN '13.

[15]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[16]  Kevin M. Lepak,et al.  Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor , 2010, IEEE Micro.

[17]  Bernhard Seeger,et al.  Progressive Merge Join: A Generic and Non-blocking Sort-based Join Algorithm , 2002, VLDB.

[18]  Elke A. Rundensteiner,et al.  Joining Punctuated Streams , 2004, EDBT.

[19]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, Distributed and Parallel Databases.

[20]  Hideyuki Kawashima,et al.  A fast handshake join implementation on FPGA with adaptive merging network , 2013, SSDBM.

[21]  Hans-Arno Jacobsen,et al.  Efficient event processing through reconfigurable hardware for algorithmic trading , 2010, Proc. VLDB Endow..

[22]  Ippokratis Pandis,et al.  NUMA-aware algorithms: the case of data shuffling , 2013, CIDR.

[23]  Elke A. Rundensteiner,et al.  Evaluating window joins over punctuated streams , 2004, CIKM '04.