Performance modeling in CUDA streams — A means for high-throughput data processing

Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal of G-SDMS is to support concurrent processing of heterogenous query processing operations and enable resource allocation among such operations. Understanding the performance of operations as a result of resource consumption is thus a premise in the design of G-SDMS. With NVIDIA's CUDA framework as the system implementation platform, we present our recent work on performance modeling of CUDA kernels running concurrently under a runtime mechanism named CUDA stream. Specifically, we explore the connection between performance and resource occupancy of compute-bound kernels and develop a model that can predict the performance of such kernels. Furthermore, we provide an in-depth anatomy of the CUDA stream mechanism and summarize the main kernel scheduling disciplines in it. Our models and derived scheduling disciplines are verified by extensive experiments using synthetic and real-world CUDA kernels.

[1]  Jens Teubner,et al.  Spinning relations: high-speed networks for distributed join processing , 2009, DaMoN '09.

[2]  Kevin Skadron,et al.  Accelerating SQL database operations on a GPU with CUDA , 2010, GPGPU-3.

[3]  Hubert Nguyen,et al.  GPU Gems 3 , 2007 .

[4]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[5]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[6]  Marcin Zukowski,et al.  Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS , 2007, VLDB.

[7]  Bingsheng He,et al.  Relational joins on graphics processors , 2008, SIGMOD Conference.

[8]  Anand Kumar,et al.  Data management systems on GPUs: promises and challenges , 2013, SSDBM.

[9]  Subramanian Arumugam,et al.  The DataPath system: a data-centric analytic processing engine for large data warehouses , 2010, SIGMOD Conference.

[10]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[11]  Anastasia Ailamaki,et al.  QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[12]  Gustavo Alonso,et al.  Predictable Performance for Unpredictable Workloads , 2009, Proc. VLDB Endow..

[13]  Yuni Xia,et al.  Performance analysis of a dual-tree algorithm for computing spatial distance histograms , 2011, The VLDB Journal.

[14]  Kenneth A. Ross,et al.  Ameliorating memory contention of OLAP operators on GPU processors , 2012, DaMoN '12.

[15]  Frederick Reiss,et al.  Constant-Time Query Processing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  Dinesh Manocha,et al.  Fast computation of database operations using graphics processors , 2005, SIGGRAPH Courses.