The computing power of current Graphical Processing Units (GPUs) has increased rapidly over the years. They offer much more computational power than recent CPUs by providing a vast number of simple, data parallel, multithreaded cores. In this paper, we proposed an improved implementation of parallel selection and compare the performance of different parallel selection algorithms on the current generation of NVIDIA GPUs. That is, given a massively large array of elements, we were interested in how we could use a GPU to efficiently select those elements that meet certain criteria and then store them into a target array for further processing. The optimization techniques used and implementation issues encountered are discussed in detail. Furthermore, the experimental results show that our advanced implementation performs an average of 2.88 times faster than Thrust, an open-source parallel algorithms library.
[1]
James Christopher Wyllie,et al.
The Complexity of Parallel Computations
,
1979
.
[2]
Mark J. Harris,et al.
Optimizing Parallel Prefix Operations for the Fermi Architecture
,
2012
.
[3]
Jie Cheng,et al.
Programming Massively Parallel Processors. A Hands-on Approach
,
2010,
Scalable Comput. Pract. Exp..
[4]
Chung-Ta King,et al.
A Fast Implementation of Parallel Discrete-Event Simulation on GPGPU
,
2013
.
[5]
Yeh-Ching Chung,et al.
Optimizing Pairwise Box Intersection Checking on GPUs for Large-Scale Simulations
,
2013,
TOMC.