Toward GPU Accelerated Data Stream Processing

In recent years, the need for continuous processing and analysis of data streams has increased rapidly. To achieve high throughput-rates, stream-applications make use of operatorparallelization, batching-strategies and distribution. Another possibility is to utilize co-processors capabilities per operator. Further, the database community noticed, that a columnoriented architecture is essential for ecient co-processing, since the data transfer overhead is smaller compared to transferring whole tables. However, current systems still rely on a row-wise architecture for stream processing, because it requires data structures for high velocity. In contrast, stream portions are in rest while being bound to a window. With this, we are able to alter the per-window event representation from row to column orientation, which will enable us to exploit GPU acceleration. To provide general-purpose GPU capabilities for stream processing, the varying window sizes lead to challenges. Since very large windows cannot be passed directly to the GPU, we propose to split the variable-length windows into fixed-sized window portions. Further, each such portion has a columnoriented event representation. In this paper, we present a time and space ecient, data corruption free concept for this task. Finally, we identify open research challenges related to co-processing in the context of stream processing.

[1]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[2]  Myoung-Ho Kim,et al.  Time-slide window join over data streams , 2014, Journal of Intelligent Information Systems.

[3]  Michael Stonebraker,et al.  Aurora: a data stream management system , 2003, SIGMOD '03.

[4]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[5]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[6]  Wolfgang Lehner,et al.  Stream Join Processing on Heterogeneous Processors , 2013, BTW Workshops.

[7]  David Broneske Adaptive Reprogramming for Databases on Heterogeneous Processors , 2015, SIGMOD PhD Symposium.

[8]  Gunter Saake,et al.  Database Scan Variants on Modern CPUs: A Performance Study , 2013, IMDM@VLDB.

[9]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[10]  Gunter Saake,et al.  Toward GPU-accelerated Database Optimization , 2015, Datenbank-Spektrum.

[11]  Gunter Saake,et al.  Load-aware inter-co-processor parallelism in database query processing , 2014, Data Knowl. Eng..

[12]  Bernhard Seeger,et al.  JEPC: The Java Event Processing Connectivity , 2013, Datenbank-Spektrum.

[13]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[14]  Alain Biem,et al.  IBM infosphere streams for scalable, real-time, intelligent transportation services , 2010, SIGMOD Conference.

[15]  Anand Kumar,et al.  Data management systems on GPUs: promises and challenges , 2013, SSDBM.

[16]  Alessandro Margara,et al.  Low latency complex event processing on parallel hardware , 2012, J. Parallel Distributed Comput..

[17]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[18]  Jennifer Widom,et al.  STREAM: The Stanford Data Stream Management System , 2016, Data Stream Management.

[19]  Bingsheng He,et al.  Relational joins on graphics processors , 2008, SIGMOD Conference.

[20]  Jürgen Krämer Continuous queries over data stream - semantics and implementation , 2009, BTW.

[21]  Gunter Saake,et al.  Exploring the Design Space of a GPU-Aware Database Architecture , 2013, ADBIS.

[22]  Bernhard Seeger,et al.  PIPES: a public infrastructure for processing and exploring streams , 2004, SIGMOD '04.

[23]  Michael Stonebraker,et al.  The Aurora and Medusa Projects , 2003, IEEE Data Eng. Bull..

[24]  Wolfgang Lehner,et al.  The HELLS-join: a heterogeneous stream join for extremely large windows , 2013, DaMoN '13.

[25]  Gunter Saake,et al.  Toward Hardware-Sensitive Database Operations , 2014, EDBT.

[26]  Sebastian Breß The Design and Implementation of CoGaDB: A Column-oriented GPU-accelerated DBMS , 2014, Datenbank-Spektrum.

[27]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.