A Scalable Software Framework for Stateful Stream Data Processing on Multiple GPUs and Applications

During the past few years, the increase of computational power has been realized using more processors with multiple cores and specific processing units like graphics processing units (GPUs). Also, the introduction of programming languages such as CUDA and OpenCL makes it easy, even for non-graphics programmers, to exploit the computational power of massively parallel processors available in current GPUs. Although CUDA and OpenCL relieve programmers from considering many low-level details of parallel programming on multiple cores on a single GPU, the same support at a higher level of parallelization for multiple GPUs is still under research. In particular, fundamental issues of memory management and synchronization must be dealt with directly by the programmer. In this chapter, we introduce concepts for CUDA-based frameworks which are designed for stateful stream data processing for graph-like arrangements of processing modules on two or more GPUs in a single compute node. We evaluate these concepts and further elaborate on the approach of our choice. Our approach relieves the programmer from error-prone chores of memory management and synchronization. The chapter presents detailed evaluation results which demonstrate the scalability of the proposed framework. To demonstrate the usability of our framework, we utilize it for demanding online processing in the areas of crystallographic structure detection and video decryption.

[1]  Assaf Schuster,et al.  Processing data streams with hard real-time constraints on heterogeneous systems , 2011, ICS '11.

[2]  Dominique Houzet,et al.  SysCellC: a data-flow programming model on multi-GPU , 2010, ICCS.

[3]  Christian Plessl,et al.  Transformation of Scientific Algorithms to Parallel Computing Code: Single GPU and MPI Multi GPU Backends with Subdomain Support , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.

[4]  Shinichi Yamagiwa,et al.  Efficient handling of stream buffers in GPU stream-based computing platform , 2011, Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.

[5]  David R. Kaeli,et al.  Exploring the multiple-GPU design space , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[6]  Kwan-Liu Ma,et al.  Multi-GPU volume rendering using MapReduce , 2010, HPDC '10.

[7]  Norbert Meidinger,et al.  Data analysis for characterizing PNCCDS , 2008, 2008 IEEE Nuclear Science Symposium Conference Record.

[8]  Metin Nafi Gürcan,et al.  Coordinating the use of GPU and CPU for improving performance of compute intensive applications , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[9]  Frank Mueller,et al.  GStream: A General-Purpose Data Streaming Framework on GPU Clusters , 2011, 2011 International Conference on Parallel Processing.

[10]  Michael R. Macedonia,et al.  The GPU Enters Computing's Mainstream , 2003, Computer.

[11]  Christoph W. Kessler,et al.  SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.

[12]  Long Chen,et al.  Dynamic load balancing on single- and multi-GPU systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[13]  Long Chen,et al.  Exploring Fine-Grained Task-Based Execution on Multi-GPU Systems , 2011, 2011 IEEE International Conference on Cluster Computing.

[14]  Andreas Kopmann,et al.  UFO: A Scalable GPU-based Image Processing Framework for On-line Monitoring , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[15]  Kevin Curran,et al.  Digital image steganography: Survey and analysis of current methods , 2010, Signal Process..

[16]  Xing-yuan Wang,et al.  A novel block cryptosystem based on the coupled chaotic map lattice , 2013 .