A portable implementation of the radix sort algorithm in OpenCL

We present a portable OpenCL implementation of the radix sort algorithm. We test it on several GPUs or CPUs in order to assess its good performances on different hardware. We also apply our implementation to the Particle-In-Cell (PIC) sorting, which is useful in plasma physics simulations.

[1]  Guy E. Blelloch,et al.  Radix sort for vector multiprocessors , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[2]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[3]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[4]  David R. Musser,et al.  Introspective Sorting and Selection Algorithms , 1997, Softw. Pract. Exp..

[5]  David Tskhakaya,et al.  Optimization of PIC codes by improved memory management , 2007, J. Comput. Phys..