Parallel selection on GPUs

Abstract We present a novel parallel selection algorithm for GPUs capable of handling single rank selection (single selection) and multiple rank selection (multiselection). The algorithm requires no assumptions on the input data distribution, and has a much lower recursion depth compared to many state-of-the-art algorithms. We implement the algorithm for different GPU generations, always leveraging the respectively-available low-level communication features, and assess the performance on server-line hardware. The computational complexity of our SampleSelect algorithm is comparable to specialized algorithms designed for – and exploiting the characteristics of – “pleasant” data distributions. At the same time, as the proposed SampleSelect algorithm does not work on the actual element values but on the element ranks of the elements only, it is robust to the input data and can complete significantly faster for adversarial data distributions. We also address the use case of approximate selection by designing a variant that radically reduces the computational cost while preserving high approximation accuracy.

[1]  Sebastian Winkel,et al.  Super Scalar Sample Sort , 2004, ESA.

[2]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[3]  Frank H. Mathis A Generalized Birthday Problem , 1991, SIAM Rev..

[4]  Gleb Beliakov Parallel calculation of the median and order statistics on GPUs with application to robust regression , 2011, ArXiv.

[5]  Edmond Chow,et al.  ParILUT - A Parallel Threshold ILU for GPUs , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[6]  F. Mosteller On Some Useful "Inefficient" Statistics , 1946 .

[7]  Joaquín Pérez Ortega,et al.  Systematic Fusion of CUDA Kernels for Iterative Sparse Linear System Solvers , 2015, Euro-Par.

[8]  Manuel Blum,et al.  Time Bounds for Selection , 1973, J. Comput. Syst. Sci..

[9]  Stephan Olariu,et al.  An efficient parallel algorithm for multiselection , 1991, Parallel Comput..

[10]  Hartwig Anzt,et al.  Approximate and Exact Selection on GPUs , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[11]  Kurt Mehlhorn,et al.  Towards Optimal Multiple Selection , 2005, ICALP.

[12]  Edmond Chow,et al.  ParILUT - A New Parallel Threshold ILU Factorization , 2018, SIAM J. Sci. Comput..

[13]  Jeffrey D. Blanchard,et al.  Fast k-selection algorithms for graphics processing units , 2012, JEAL.

[14]  Dinesh Manocha,et al.  Fast computation of database operations using graphics processors , 2004, SIGMOD '04.

[15]  Weifeng Liu,et al.  Fast segmented sort on GPUs , 2017, ICS.

[16]  W. Donald Frazer,et al.  Samplesort: A Sampling Approach to Minimal Storage Tree Sorting , 1970, JACM.

[17]  C. A. R. Hoare,et al.  Algorithm 65: find , 1961, Commun. ACM.

[18]  Hong Shen Efficient parallel algorithms for selection and multiselection on mesh-connected computers , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[19]  Peter Sanders,et al.  Communication Efficient Algorithms for Top-k Selection Problems , 2015, IPDPS.

[20]  Vernon Rego,et al.  A Fast Parallel Selection Algorithm on GPUs , 2015, 2015 International Conference on Computational Science and Computational Intelligence (CSCI).

[21]  Laura Monroe,et al.  Randomized selection on the GPU , 2011, HPG '11.