Efficient co-processor utilization in database query processing

Specialized processing units such as GPUs or FPGAs provide great opportunities to speed up database operations by exploiting parallelism and relieving the CPU. However, distributing a workload on suitable (co-)processors is a challenging task, because of the heterogeneous nature of a hybrid processor/co-processor system. In this paper, we present a framework that automatically learns and adapts execution models for arbitrary algorithms on any (co-)processor. Our physical optimizer uses the execution models to distribute a workload of database operators on available (co-)processing devices. We demonstrate its applicability for two common use cases in modern database systems. Additionally, we contribute an overview of GPU-co-processing approaches, an in-depth discussion of our framework's operator model, the required steps for deploying our framework in practice and the support of complex operators requiring multi-dimensional learning strategies.

[1]  José A. B. Fortes,et al.  On the Use of Machine Learning to Predict the Time and Resources Consumed by Applications , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[2]  Nick Roussopoulos,et al.  Adaptive selectivity estimation using query feedback , 1994, SIGMOD '94.

[3]  Alexander Zeier,et al.  Applicability of GPU Computing for Efficient Merge in In-Memory Databases , 2011, ADMS@VLDB.

[4]  Sudhakar Yalamanchili,et al.  Modeling GPU-CPU workloads and systems , 2010, GPGPU-3.

[5]  Kim M. Hazelwood,et al.  Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[6]  Kai-Uwe Sattler,et al.  GiST scan acceleration using coprocessors , 2012, DaMoN '12.

[7]  Bingsheng He,et al.  Database compression on graphics processors , 2010, Proc. VLDB Endow..

[8]  Bingsheng He,et al.  High-Throughput Transaction Executions on Graphics Processors , 2011, Proc. VLDB Endow..

[9]  Bingsheng He,et al.  Relational query coprocessing on graphics processors , 2009, TODS.

[10]  Martin L. Kersten,et al.  Accelerating Foreign-Key Joins using Asymmetric Memory Channels , 2011, ADMS@VLDB.

[11]  Tarek A. El-Ghazawi,et al.  Exploiting concurrent kernel execution on graphic processing units , 2011, 2011 International Conference on High Performance Computing & Simulation.

[12]  David R. Kaeli,et al.  Exploring the multiple-GPU design space , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[13]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[14]  Volker Markl,et al.  A First Step Towards GPU-assisted Query Optimization , 2012, ADMS@VLDB.

[15]  Eli Upfal,et al.  Learning-based Query Performance Modeling and Prediction , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[16]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  Liulin Cao,et al.  Parallel k-Nearest Neighbor Search on Graphics Hardware , 2010 .

[18]  Tom R. Halfhill NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .

[19]  Pradeep Dubey,et al.  FAST: fast architecture sensitive tree search on modern CPUs and GPUs , 2010, SIGMOD Conference.

[20]  Gunter Saake,et al.  Automatic Selection of Processing Units for Coprocessing in Databases , 2012, ADBIS.

[21]  Babak Falsafi,et al.  Accelerating database operators using a network processor , 2005, DaMoN '05.

[22]  Füsun Özgüner,et al.  Run-time statistical estimation of task execution times for heterogeneous distributed computing , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[23]  Martin L. Kersten,et al.  Generic Database Cost Models for Hierarchical Memory Systems , 2002, VLDB.

[24]  Bingsheng He,et al.  Relational joins on graphics processors , 2008, SIGMOD Conference.

[25]  Martin L. Kersten,et al.  X-device query processing by bitwise distribution , 2012, DaMoN '12.

[26]  Peter J. Haas,et al.  Statistical Learning Techniques for Costing XML Queries , 2005, VLDB.

[27]  Ben Taskar,et al.  Selectivity estimation using probabilistic models , 2001, SIGMOD '01.

[28]  Robert Wrembel,et al.  GPU-WAH: Applying GPUs to Compressing Bitmap Indexes with Word Aligned Hybrid , 2010, DEXA.

[29]  Amitava Datta,et al.  Exploring graphics processing units as parallel coprocessors for online aggregation , 2010, DOLAP '10.

[30]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[31]  Vassilis J. Tsotras,et al.  Efficient XML Path Filtering Using GPUs , 2011, ADMS@VLDB.

[32]  Dariusz Rafal Augustyn,et al.  Applying CUDA Technology in DCT-Based Method of Query Selectivity Estimation , 2012, ADBIS Workshops.

[33]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.

[34]  Jin Wang,et al.  Relational Algebra Algorithms and Data Structures for GPU , 2012 .

[35]  Jens Teubner,et al.  How soccer players would do stream joins , 2011, SIGMOD '11.

[36]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[37]  Kevin Skadron,et al.  Accelerating SQL database operations on a GPU with CUDA , 2010, GPGPU-3.