Fast Sorting Algorithms using AVX-512 on Intel Knights Landing

This paper describes fast sorting techniques using the recent AVX-512 instruction set. Our implementations benefit from the latest possibilities offered by AVX-512 to vectorize a two-parts hybrid algorithm: we sort the small arrays using a branch- free Bitonic variant, and we provide a vectorized partitioning kernel which is the main component of the well-known Quicksort. Our algorithm sorts in-place and is straightforward to implement thanks to the new instructions. Meanwhile, we also show how an algorithm can be adapted and implemented with AVX-512. We report a performance study on the Intel KNL where our approach is faster than the GNU C++ sort algorithm for any size in both integer and double floating-point arithmetics by a factor of 4 in average.

[1]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[2]  David R. Musser,et al.  Introspective Sorting and Selection Algorithms , 1997, Softw. Pract. Exp..

[3]  Peter M. Kogge,et al.  The Architecture of Pipelined Computers , 1981 .

[4]  Pradeep Dubey,et al.  Efficient implementation of sorting on multi-core SIMD CPU architecture , 2008, Proc. VLDB Endow..

[5]  Sartaj Sahni,et al.  Bitonic Sort on a Mesh-Connected Parallel Computer , 1979, IEEE Transactions on Computers.

[6]  Shay Gueron,et al.  Fast Quicksort Implementation Using AVX Instructions , 2016, Comput. J..

[7]  Goetz Graefe,et al.  Implementing sorting in database systems , 2006, CSUR.

[8]  José Nelson Amaral,et al.  Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms , 2007, SPAA '07.

[9]  Sebastian Winkel,et al.  Super Scalar Sample Sort , 2004, ESA.

[10]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[11]  Toshio Nakatani,et al.  AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[12]  Turner Whitted,et al.  Designing a PC Game Engine , 1998, IEEE Computer Graphics and Applications.