Stream Processing on Hybrid CPU/Intel® Xeon Phi™ Systems

Stream processing is currently central to handle large volumes of data generated at high rates. However, the efficient processing of such quantity of data demands massively parallel hardware. The usual approach is to rely on clusters of multi-processors, where network communication may become a bottleneck. Some work has also been done in the GPU computing field. Yet, the GPUs’ programming complexity and the existence of synchronization-related overheads, when the streaming graph scales, have hampered the integration of GPUs in the Big Data streaming frameworks. In this paper we explore the unique characteristics of the Intel Xeon Phi processor to develop a stream processing framework for hybrid CPU/Intel Xeon Phi systems. We built atop the Intel Threading Building Blocks library and the Marrow algorithmic skeleton framework to offer an easily programmable high performance system. Our experimental results show that offloading the computationally heavy nodes of a streaming graph to the Xeon Phi may earn considerable speed-ups. Furthermore, additional gains may be obtained by sharing the processing load between the CPU(s) and the Xeon Phi processor(s).

[1]  Hervé Paulino,et al.  Execution of compound multi‐kernel OpenCL computations in multi‐CPU/multi‐GPU environments , 2015, Concurr. Comput. Pract. Exp..

[2]  Abhishek Udupa,et al.  Software Pipelined Execution of Stream Programs on GPUs , 2009, 2009 International Symposium on Code Generation and Optimization.

[3]  Hideya Iwasaki,et al.  A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming , 2009, APLAS.

[4]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[5]  Avi Mendelson,et al.  Scheduling processing of real-time data streams on heterogeneous multi-GPU systems , 2012, SYSTOR '12.

[6]  Jack J. Dongarra,et al.  HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi , 2015, Sci. Program..

[7]  Gunter Saake,et al.  Toward GPU Accelerated Data Stream Processing , 2015, GvD.

[8]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[9]  Siegfried Benkner,et al.  HyPHI - Task Based Hybrid Execution C++ Library for the Intel Xeon Phi Coprocessor , 2013, 2013 42nd International Conference on Parallel Processing.

[10]  Frank Mueller,et al.  GStream: A General-Purpose Data Streaming Framework on GPU Clusters , 2011, 2011 International Conference on Parallel Processing.

[11]  Hervé Paulino,et al.  Algorithmic Skeleton Framework for the Orchestration of GPU Computations , 2013, Euro-Par.

[12]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[13]  Weng-Fai Wong,et al.  Scalable framework for mapping streaming applications onto multi-GPU systems , 2012, PPoPP '12.

[14]  Srinivas Aluru,et al.  Parallel Framework for Dimensionality Reduction of Large-Scale Datasets , 2015, Sci. Program..

[15]  Hervé Paulino,et al.  On the support of task-parallel algorithmic skeletons for multi-GPU computing , 2014, SAC.