AN IMPROVED IMPLEMENTATION OF PARALLEL SELECTION ON GPUs

The computing power of current Graphical Processing Units (GPUs) has increased rapidly over the years. They offer much more computational power than recent CPUs by providing a vast number of simple, data parallel, multithreaded cores. In this paper, we proposed an improved implementation of parallel selection and compare the performance of different parallel selection algorithms on the current generation of NVIDIA GPUs. That is, given a massively large array of elements, we were interested in how we could use a GPU to efficiently select those elements that meet certain criteria and then store them into a target array for further processing. The optimization techniques used and implementation issues encountered are discussed in detail. Furthermore, the experimental results show that our advanced implementation performs an average of 2.88 times faster than Thrust, an open-source parallel algorithms library.