Utilizing the Double-Precision Floating-Point Computing Power of GPUs for RSA Acceleration

Asymmetric cryptographic algorithm (e.g., RSA and Elliptic Curve Cryptography) implementations on Graphics Processing Units (GPUs) have been researched for over a decade. The basic idea of most previous contributions is exploiting the highly parallel GPU architecture and porting the integer-based algorithms from general-purpose CPUs to GPUs, to offer high performance. However, the great potential cryptographic computing power of GPUs, especially by the more powerful floating-point instructions, has not been comprehensively investigated in fact. In this paper, we fully exploit the floating-point computing power of GPUs, by various designs, including the floating-point-based Montgomery multiplication/exponentiation algorithm and Chinese Remainder Theorem (CRT) implementation in GPU. And for practical usage of the proposed algorithm, a new method is performed to convert the input/output between octet strings and floating-point numbers, fully utilizing GPUs and further promoting the overall performance by about 5%. The performance of RSA-2048/3072/4096 decryption on NVIDIA GeForce GTX TITAN reaches 42,211/12,151/5,790 operations per second, respectively, which achieves 13 times the performance of the previous fastest floating-point-based implementation (published in Eurocrypt 2009). The RSA-4096 decryption precedes the existing fastest integer-based result by 23%.

[1]  Tim Güneysu,et al.  Exploiting the Power of GPUs for Asymmetric Cryptography , 2008, CHES.

[2]  Yuan Zhao,et al.  Exploiting the Floating-Point Computing Power of GPUs for RSA , 2014, ISC.

[3]  Leonel Sousa,et al.  Elliptic Curve point multiplication on GPUs , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[4]  Majid Ahmadi,et al.  A GPU implementation of the Montgomery multiplication algorithm for elliptic curve cryptography , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[5]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[6]  Leonel Sousa,et al.  RNS-Based Elliptic Curve Point Multiplication for Massive Parallel Architectures , 2012, Comput. J..

[7]  Alfred Menezes,et al.  Guide to Elliptic Curve Cryptography , 2004, Springer Professional Computing.

[8]  Tibor Juhas The use of elliptic curves in cryptography , 2007 .

[9]  Zhi Guan,et al.  Accelerating RSA with Fine-Grained Parallelism Using GPU , 2015, ISPEC.

[10]  Yuan Zhao,et al.  An Efficient Elliptic Curve Cryptography Signature Server With GPU Acceleration , 2017, IEEE Transactions on Information Forensics and Security.

[11]  Jyh-Charn Liu,et al.  EAGL: An Elliptic Curve Arithmetic GPU-Based Library for Bilinear Pairing , 2013, Pairing.

[12]  J. Quisquater,et al.  Fast decipherment algorithm for RSA public-key cryptosystem , 1982 .

[13]  Charles C. Weems,et al.  Pushing the Performance Envelope of Modular Exponentiation Across Multiple Generations of GPUs , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[14]  Ç. Koç Analysis of sliding window techniques for exponentiation , 1995 .

[15]  Victor S. Miller,et al.  Use of Elliptic Curves in Cryptography , 1985, CRYPTO.

[16]  Nigel P. Smart,et al.  Toward Acceleration of RSA Using 3D Graphics Hardware , 2007, IMACC.

[17]  Seungyeop Han,et al.  SSLShader: Cheap SSL Acceleration with Commodity Processors , 2011, NSDI.

[18]  Chen,et al.  The billion-mulmod-per-second PC , 2009 .

[19]  Holger Orup,et al.  Simplifying quotient determination in high-radix modular multiplication , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[20]  Yuan Zhao,et al.  Exploiting the Potential of GPUs for Modular Multiplication in ECC , 2014, WISA.

[21]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[22]  N. Koblitz Elliptic curve cryptosystems , 1987 .

[23]  Tanja Lange,et al.  ECM on Graphics Cards , 2009, IACR Cryptol. ePrint Arch..

[24]  Donald E. Knuth The Art of Computer Programming 2 / Seminumerical Algorithms , 1971 .

[25]  Tolga Acar,et al.  Analyzing and comparing Montgomery multiplication algorithms , 1996, IEEE Micro.

[26]  John Waldron,et al.  Efficient Acceleration of Asymmetric Cryptography on Graphics Hardware , 2009, AFRICACRYPT.

[27]  Joppe W. Bos Low-Latency Elliptic Curve Scalar Multiplication , 2012, International Journal of Parallel Programming.

[28]  Ian Goldberg,et al.  Solving Discrete Logarithms in Smooth-Order Groups with CUDA 1 , 2012 .

[29]  Jakob Jonsson,et al.  Public-Key Cryptography Standards (PKCS) #1: RSA Cryptography Specifications Version 2.1 , 2003, RFC.

[30]  James Demmel,et al.  IEEE Standard for Floating-Point Arithmetic , 2008 .