Towards a Hybrid Design for Fast Query Processing in DB2 with BLU Acceleration Using Graphical Processing Units: A Technology Demonstration

In this paper, we show how we use Nvidia GPUs and host CPU cores for faster query processing in a DB2 database using BLU Acceleration (DB2's column store technology). Moreover, we show the benefits and problems of using hardware accelerators (more specifically GPUs) in a real commercial Relational Database Management System(RDBMS).We investigate the effect of off-loading specific database operations to a GPU, and show how doing so results in a significant performance improvement. We then demonstrate that for some queries, using just CPU to perform the entire operation is more beneficial. While we use some of Nvidia's fast kernels for operations like sort, we have also developed our own high performance kernels for operations such as group by and aggregation. Finally, we show how we use a dynamic design that can make use of optimizer metadata to intelligently choose a GPU kernel to run. For the first time in the literature, we use benchmarks representative of customer environments to gauge the performance of our prototype, the results of which show that we can get a speed increase upwards of 2x, using a realistic set of queries.

[1]  Wolfgang Lehner,et al.  Demonstrating efficient query processing in heterogeneous environments , 2014, SIGMOD Conference.

[2]  Jürgen Teich,et al.  Acceleration of SQL Restrictions and Aggregations through FPGA-Based Dynamic Partial Reconfiguration , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[3]  Sudhakar Yalamanchili,et al.  Red Fox: An Execution Environment for Relational Query Processing on GPUs , 2014, CGO '14.

[4]  Gabriel Zachmann,et al.  GPU-ABiSort: optimal parallel sorting on stream architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[5]  Kenneth A. Ross,et al.  High throughput heavy hitter aggregation for modern SIMD processors , 2013, DaMoN '13.

[6]  Qiming Chen,et al.  GPU-Accelerated Predicate Evaluation on Column Store , 2010, WAIM.

[7]  Philippas Tsigas,et al.  GPU-Quicksort: A practical Quicksort algorithm for graphics processors , 2010, JEAL.

[8]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[9]  Sebastian Breß,et al.  Why it is time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS , 2013, Proc. VLDB Endow..

[10]  Martin Burtscher,et al.  Floating-point data compression at 75 Gb/s on a GPU , 2011, GPGPU-4.

[11]  Dinesh Manocha,et al.  Fast computation of database operations using graphics processors , 2005, SIGGRAPH Courses.

[12]  John D. Owens,et al.  Bin-Hash Indexing: A Parallel Method for Fast Query Processing , 2008, ICDE 2008.

[13]  Bingsheng He,et al.  Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture , 2013, Proc. VLDB Endow..

[14]  Pradeep Dubey,et al.  Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort , 2010, SIGMOD Conference.

[15]  Sam Lightstone,et al.  DB2 with BLU Acceleration: So Much More than Just a Column Store , 2013, Proc. VLDB Endow..

[16]  Pradeep Dubey,et al.  FAST: fast architecture sensitive tree search on modern CPUs and GPUs , 2010, SIGMOD Conference.

[17]  Gunter Saake,et al.  Ocelot/HyPE: Optimized Data Processing on Heterogeneous Hardware , 2014, Proc. VLDB Endow..

[18]  Bingsheng He,et al.  Relational joins on graphics processors , 2008, SIGMOD Conference.

[19]  Anand Kumar,et al.  Data management systems on GPUs: promises and challenges , 2013, SSDBM.

[20]  A. Grimshaw,et al.  High Performance and Scalable Radix Sorting: a Case Study of Implementing Dynamic Parallelism for GPU Computing , 2011, Parallel Process. Lett..

[21]  Bingsheng He,et al.  Database compression on graphics processors , 2010, Proc. VLDB Endow..

[22]  Bingsheng He,et al.  OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures , 2013, Proc. VLDB Endow..

[23]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.