A parallel selection sorting algorithm on GPUs using binary search

This paper describes a hybrid sorting which is the combination of radix sort and selection sort on graphic processing unit (GPU). The proposed algorithm is based on “Split and Concurrent Selection” (SCS) strategy. First, the data sequence is split in several pieces that are sorted in parallel using Radix sort. After that it applies parallel selection sort to obtain the final sorted sequence. Parallel selection sort finds the correct position of each elements of a data sequence and then copy the elements of a data sequence to corresponding position to obtain the final sorted data sequence. This paper analyses the computational complexity of proposed parallel sorting algorithm and compares it with other existing algorithms. It is implemented using CUDA 5.0 and results are evaluated on Tesla C2075 GPU. Experimental results of proposed algorithm are compared with results of best sequential sorting algorithm and odd- even merge sort based parallel sorting algorithm. Proposed algorithm shows up to 50 times speed up as compare to serial and two fold speedup as compare to parallel algorithm.

[1]  Kenji Suehiro,et al.  Integer sorting on shared-memory vector parallel computers , 1998, ICS '98.

[2]  Danny Ziyi Chen,et al.  Efficient Parallel Binary Search on Sorted Arrays, with Applications , 1995, IEEE Trans. Parallel Distributed Syst..

[3]  Ezequiel Herruzo,et al.  A New Parallel Sorting Algorithm based on Odd-Even Mergesort , 2007, 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP'07).

[4]  Jens H. Krüger,et al.  Fast Four‐Way Parallel Radix Sorting on GPUs , 2009, Comput. Graph. Forum.

[5]  Richard Cole,et al.  Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[6]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[7]  Alexandru Nicolau,et al.  Adaptive Bitonic Sorting: An Optimal Parallel Algorithm for Shared-Memory Machines , 1989, SIAM J. Comput..

[8]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[9]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[10]  Norbert Luttenberger,et al.  Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[11]  J. T. Robinson,et al.  Parallel Quicksort Using Fetch-and-Add , 1990, IEEE Trans. Computers.

[12]  J. S. Huang,et al.  Parallel sorting and data partitioning by sampling , 1983 .

[13]  Ulf Assarsson,et al.  Fast parallel GPU-sorting using a hybrid algorithm , 2008, J. Parallel Distributed Comput..

[14]  Michael Garland,et al.  Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[15]  Damir A. Jamsek Designing and optimizing compute kernels on NVIDIA GPUs , 2009, 2009 Asia and South Pacific Design Automation Conference.

[16]  J. Krüger,et al.  Fast 4-way parallel radix sorting on GPUs , 2009 .

[17]  Aishy Amer,et al.  An FPGA Architecture of Stable-Sorting on a Large Data Volume : Application to Video Signals , 2007, 2007 41st Annual Conference on Information Sciences and Systems.

[18]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[19]  Selim G. Akl,et al.  Design and analysis of parallel algorithms , 1985 .