Combining Distributed Computing and Massively Parallel Devices to Accelerate Stream Data Processing

—Data streaming systems have been successfully employed for various data processing tasks. Their main benefit is that they simplify the design of data-intensive applications and they introduce many opportunities for task, data, and pipeline parallelism. In this work, we are proposing an enhancement for data streaming systems that allows distributed processing of the data streams and also introduce parallel accelerators, which can be utilized for data parallel subtasks. The viability of our approach is verified by integrating the support for heterogeneous accelerators into the Bobox system, which is a parallel framework for data stream processing.

[1]  Martin Krulis,et al.  Bobolang: a language for parallel streaming applications , 2014, HPDC '14.

[2]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[3]  Albert Cohen,et al.  A stream-computing extension to OpenMP , 2011, HiPEAC.

[4]  Mark A. Franklin,et al.  Auto-Pipe: Streaming Applications on Architecturally Diverse Systems , 2010, Computer.

[5]  Bingsheng He,et al.  Relational query coprocessing on graphics processors , 2009, TODS.

[6]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[7]  Flavien Lens,et al.  Oracle , 2008, SIGGRAPH ASIA Computer Animation Festival.

[8]  B. Reed,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[9]  James Reinders,et al.  Intel® threading building blocks , 2008 .

[10]  William J. Dally,et al.  Compiling for stream processing , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[11]  J. Gray,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[12]  P. Hanrahan,et al.  Brook for GPUs: Stream Computing on Graphics Hardware , 2004, ACM Trans. Graph..

[13]  Jennifer Widom,et al.  STREAM: the stanford stream data manager (demonstration description) , 2003, SIGMOD '03.

[14]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[15]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[16]  Robert Stephens,et al.  A survey of stream processing , 1997, Acta Informatica.

[17]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[18]  J. Dokulil,et al.  Parallel SPARQL Query Processing Using Bobox , 2013 .

[19]  Jakub Yaghob,et al.  Task scheduling in hybrid CPU-GPU systems , 2013 .

[20]  Jeremy Singer,et al.  Comparing Fork / Join and MapReduce , 2012 .

[21]  Jakub Yaghob,et al.  Data-Flow Awareness in Parallel Data Processing , 2012, IDC.

[22]  D. Bednárek,et al.  On Parallel Evaluation of SPARQL Queries , 2011 .

[23]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[24]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[25]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[26]  Peter Mattson,et al.  A programming system for the imagine media processor , 2002 .