A Cache-Efficient Sorting Algorithm for Database and Data Mining Computations using Graphics Processors

We present a fast sorting algorithm using graphics processors (GPUs) that adapts well to database and data mining applications. Our algorithm uses texture mapping and blending functionalities of GPUs to implement an efficient bitonic sorting network. We take into account the communication bandwidth overhead to the video memory on the GPUs and reduce the memory bandwidth requirements. We also present strategies to exploit the tile-based computational model of GPUs. Our new algorithm has a memoryefficient data access pattern and we describe an efficient instruction dispatch mechanism to improve the overall sorting performance. We have used our sorting algorithm to accelerate join-based queries and stream mining algorithms. Our results indicate up to an order of magnitude improvement over prior CPU-based and GPU-based sorting algorithms.

[1]  Jeffrey F. Naughton,et al.  Cache Conscious Algorithms for Relational Query Processing , 1994, VLDB.

[2]  Divyakant Agrawal,et al.  Hardware acceleration for spatial selections and joins , 2003, SIGMOD '03.

[3]  Bob McNamara,et al.  Tiled polygon traversal using half-plane edge functions , 2000, Workshop on Graphics Hardware.

[4]  Martin L. Kersten,et al.  Generic Database Cost Models for Hierarchical Memory Systems , 2002, VLDB.

[5]  Dinesh Manocha,et al.  Fast and approximate stream mining of quantiles and frequencies using graphics processors , 2005, SIGMOD '05.

[6]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[7]  Abhinandan Das,et al.  Approximate join processing over data streams , 2003, SIGMOD '03.

[8]  Homan Igehy,et al.  Prefetching in a texture cache architecture , 1998, Workshop on Graphics Hardware.

[9]  Kenneth A. Ross,et al.  Implementing database operations using SIMD instructions , 2002, SIGMOD '02.

[10]  Martin L. Kersten,et al.  What Happens During a Join? Dissecting CPU and Memory Optimization Effects , 2000, VLDB.

[11]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[12]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[13]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[14]  Kenneth A. Ross,et al.  Cache Conscious Indexing for Decision-Support in Main Memory , 1999, VLDB.

[15]  Ramesh C. Agarwal,et al.  A super scalar sort algorithm for RISC processors , 1996, SIGMOD '96.

[16]  Rüdiger Westermann,et al.  UberFlow: a GPU-based particle engine , 2004, SIGGRAPH '04.

[17]  Tomas Akenine-Möller,et al.  Graphics for the masses: a hardware rasterization architecture for mobile phones , 2003, ACM Trans. Graph..

[18]  Dinesh Manocha,et al.  Fast computation of database operations using graphics processors , 2005, SIGGRAPH Courses.

[19]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[20]  Divyakant Agrawal,et al.  Hardware Acceleration in Commercial Databases: A Case Study of Spatial Operations , 2004, VLDB.

[21]  Yahiko Kambayashi,et al.  Acceleration of relational database operations on vector processors , 2000, Systems and Computers in Japan.

[22]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[23]  Anastasia Ailamaki Database architectures for new hardware , 2005, 21st International Conference on Data Engineering (ICDE'05).

[24]  S. Morein Ati radeon hyperz technology , 2000 .

[25]  Yahiko Kambayashi,et al.  Acceleration of relational database operations on vector processors , 2000 .

[26]  Anoop Gupta,et al.  The Design and Analysis of a Cache Architecture for Texture Mapping , 1997, ISCA.

[27]  Divesh Srivastava,et al.  Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data , 2004, SIGMOD '04.

[28]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[29]  Kenneth A. Ross,et al.  Conjunctive selection conditions in main memory , 2002, PODS.

[30]  Sanjeev Khanna,et al.  Power-conserving computation of order-statistics over sensor networks , 2004, PODS.

[31]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[32]  Richard E. Ladner,et al.  The influence of caches on the performance of sorting , 1997, SODA '97.

[33]  Kenneth A. Ross,et al.  Buffering databse operations for enhanced instruction cache performance , 2004, SIGMOD '04.

[34]  Like Gao,et al.  Continually evaluating similarity-based pattern queries on a streaming time series , 2002, SIGMOD '02.

[35]  Pat Hanrahan,et al.  Photon mapping on programmable graphics hardware , 2003, HWWS '03.

[36]  Pat Hanrahan,et al.  Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.

[37]  Abhinandan Das,et al.  Efficient Approximation of Correlated Sums on Data Streams , 2003, IEEE Trans. Knowl. Data Eng..