Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries

There have been a number of research proposals to use discrete graphics processing units (GPUs) to accelerate database operations. Although many of these works show up to an order of magnitude performance improvement, discrete GPUs are not commonly used in modern database systems. However, there is now a proliferation of integrated GPUs which are on the same silicon die as the conventional CPU. With the advent of new programming models like heterogeneous system architecture, these integrated GPUs are considered first-class compute units, with transparent access to CPU virtual addresses and very low overhead for computation offloading. We show that integrated GPUs significantly reduce the overheads of using GPUs in a database environment. Specifically, an integrated GPU is 3x faster than a discrete GPU even though the discrete GPU has 4x the computational capability. Therefore, we develop high performance scan and aggregate algorithms for the integrated GPU. We show that the integrated GPU can outperform a four-core CPU with SIMD extensions by an average of 30% (up to 3:2x) and provides an average of 45% reduction in energy on 16 TPC-H queries.

[1]  Ryan Johnson,et al.  Row-wise parallel predicate evaluation , 2008, Proc. VLDB Endow..

[2]  Bingsheng He,et al.  In-Cache Query Co-Processing on Coupled CPU-GPU Architectures , 2014, Proc. VLDB Endow..

[3]  Bingsheng He,et al.  Relational joins on graphics processors , 2008, SIGMOD Conference.

[4]  Liwen Sun,et al.  A Partitioning Framework for Aggressive Data Skipping , 2014, Proc. VLDB Endow..

[5]  John Allen,et al.  Scuba: Diving into Data at Facebook , 2013, Proc. VLDB Endow..

[6]  Wu-chun Feng,et al.  To GPU synchronize or not GPU synchronize? , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[7]  Jignesh M. Patel,et al.  BitWeaving: fast scans for main memory data processing , 2013, SIGMOD '13.

[8]  Jignesh M. Patel,et al.  Implications of Emerging 3D GPU Architecture on the Scan Primitive , 2015, SGMD.

[9]  Silviu Teodoru,et al.  Oracle Exalytics: Engineered for Speed-of-Thought Analytics , 2011 .

[10]  Alexander Zeier,et al.  SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units , 2009, Proc. VLDB Endow..

[11]  Norman May,et al.  The SAP HANA Database -- An Architecture Overview , 2012, IEEE Data Eng. Bull..

[12]  Kenneth A. Ross,et al.  Implementing database operations using SIMD instructions , 2002, SIGMOD '02.

[13]  Eric Lo,et al.  Accelerating aggregation using intra-cycle parallelism , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[14]  Kenneth A. Ross,et al.  High throughput heavy hitter aggregation for modern SIMD processors , 2013, DaMoN '13.

[15]  Bingsheng He,et al.  Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture , 2013, Proc. VLDB Endow..

[16]  Pradeep Dubey,et al.  Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort , 2010, SIGMOD Conference.

[17]  Sam Lightstone,et al.  DB2 with BLU Acceleration: So Much More than Just a Column Store , 2013, Proc. VLDB Endow..

[18]  Dinesh Manocha,et al.  Fast computation of database operations using graphics processors , 2005, SIGGRAPH Courses.

[19]  Ismail Oukid,et al.  Vectorizing Database Column Scans with Complex Predicates , 2013, ADMS@VLDB.

[20]  Frederick Reiss,et al.  Constant-Time Query Processing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[21]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[22]  Kenneth A. Ross,et al.  Scalable aggregation on multicore processors , 2011, DaMoN '11.

[23]  Johannes Gehrke,et al.  Query optimization in compressed database systems , 2001, SIGMOD '01.

[24]  Phil Rogers,et al.  Heterogeneous system architecture overview , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).

[25]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[26]  Jignesh M. Patel,et al.  WideTable: An Accelerator for Analytical Data Processing , 2014, Proc. VLDB Endow..