A New Variant of the Barrett Algorithm Applied to Quotient Selection

Quotient Selection (QS) is a key step in the classic $O(n^{2}$) multiple precision division algorithm. On processors with fast hardware division, it is a trivial problem, but on GPUs, division is quite slow. In this paper we investigate the effectiveness of Brent and Zimmermann's variant as well as our own novel variant of Barrett's algorithm. Our new approach is shown to be suitable for low radix (single precision) QS. Three highly optimized implementations, two of the Brent and Zimmerman variant and one based on our new approach, have been developed and we show that each is many times faster than using the division operation built in to the compiler. In addition, our variant is on average 22 % faster than the other two implementations. We also sketch proofs of correctness for all of the implementations and our new algorithm.