Scalable and Efficient Parallel Selection

Selection algorithms find the \(k^{\mathrm {th}}\) smallest element from a set of elements. Although there are optimal parallel selection algorithms available for theoretical machines, these algorithms are not only difficult to implement but also inefficient in practice. Consequently, scalable applications can only use few special cases such as minimum and maximum, where efficient implementations exist. To overcome such limitations, we propose a general parallel selection algorithm that scales even on today’s largest supercomputers. Our approach is based on an efficient, unbiased median approximation method, recently introduced as median-of-3 reduction, and Hoare’s sequential QuickSelect idea from \(1961\). The resulting algorithm scales with a time complexity of \(\mathcal {O}(\log ^2 n)\) for \(n\) distributed elements while needing only \(\mathcal {O}(1)\) space. Furthermore, we prove it to be a practical solution by explaining implementation details and showing performance results for up to \(458,752\) processor cores.