High throughput multiple-precision GCD on the CUDA architecture

Investigation of the cryptanalytic strength of RSA cryptography requires computing many GCDs of two long integers (e.g., of length 1024 bits). This paper presents a high throughput parallel algorithm to perform many GCD computations concurrently on a GPU based on the CUDA architecture. The experiments with an NVIDIA GeForce GTX285 GPU and a single core of 3.0 GHz Intel Core2 Duo E6850 CPU show that the proposed GPU algorithm runs 11.3 times faster than the corresponding CPU algorithm.