Performance Evaluation of Parallel Count Sort using GPU Computing with CUDA

Objective: Sorting is considered a very important application in many areas of computer science. Nowadays parallelization of sorting algorithms using GPU computing, on CUDA hardware is increasing rapidly. The objective behind using GPU computing is that the users can get, the more speedup of the algorithms. Methods: In this paper, we have focused on count sort. It is very efficient sort with time complexity O(n). The problem with count sort is that, it is not recommended for larger sets of data because it depends on the range of key elements.In this paper this drawback has been taken for the research concern and we parallelized the count sort using GPU computing with CUDA. Findings: We have measured the speedup achieved by the parallel count sort over sequential count sort. The sorting benchmark has been used to test and measure the performance of both the versions of count sort (parallel and sequential). The sorting benchmark has six types of test cases which are uniform, bucket, Gaussian, sorted, staggered and zero.In this paper, our finding is that we have tested the parallel and sequential count sort on a larger sets of data which vary from N=1000 to N=10000000. Improvement: After testing, we have achieved 66 times more efficient results of the parallel count sort in the case of execution time using Gaussian test case. We found that the parallel count sort performs, the better experimental results over sequential in all the test cases.

[1]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[2]  Michael Garland Parallel computing with CUDA , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[3]  Ashish Kots,et al.  Implementing and Analyzing an Efficient Version of Counting Sort (E-Counting Sort) , 2014 .

[4]  M. V. Rama Sundari,et al.  Deadline Aware Two Stage Scheduling Algorithm in Cloud Computing , 2016 .

[5]  Josef Weidendorfer,et al.  Considering GPGPU for HPC Centers: Is It Worth the Effort? , 2010, Facing the Multicore-Challenge.

[6]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[7]  Bo Joel Svensson,et al.  Counting and occurrence sort for GPUs using an embedded language , 2013, FHPC '13.

[8]  Philippas Tsigas,et al.  GPU-Quicksort: A practical Quicksort algorithm for graphics processors , 2010, JEAL.

[9]  Mache Creeger,et al.  Multicore CPUs for the Masses , 2005, QUEUE.

[10]  Yao Zhang,et al.  A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[11]  John E. Stone,et al.  GPU clusters for high-performance computing , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[12]  Vitaly Osipov,et al.  GPU sample sort , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[13]  Satya Prakash Ghrera,et al.  Analysis and Testing of Sorting Algorithms on a Standard Dataset , 2015, 2015 Fifth International Conference on Communication Systems and Network Technologies.

[14]  Zongmin Ma,et al.  Count Sort for GPU Computing , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[15]  T. Vigneswaran,et al.  An Efficient Low Power and High Speed Distributed Arithmetic Design for FIR Filter , 2016 .