Comparison of Modular Arithmetic Algorithms on GPUs

We present below our first implementation results on a modular arithmetic library on GPUs for cryptography. Our library, in C++ for CUDA, provides modular arithmetic, finite field arithmetic and some ECC support. Several algorithms and memory coding styles have been compared: local, shared and register. For moderate sizes, we report up to 2.6 speedup compared to state-of-the-art library.