Faster Number Theoretic Transform on Graphics Processors for Ring Learning with Errors Based Cryptography

The Number Theoretic Transform (NTT) has been revived recently by the advent of the Ring-Learning with Errors (Ring-LWE) Homomorphic Encryption (HE) schemes. In these schemes, the NTT is used to calculate the product of high degree polynomials with multi-precision coefficients in quasilinear time. This is known as the most time-consuming operation in Ring–based HE schemes. Therefore; accelerating NTT is key to realize efficient implementations. As such, in its current version, a fast NTT implementation is included in cuHE, which is a publicly available HE library in Compute Unified Device Architecture (CUDA). We analyzed cuHE NTT kernels and found out that they suffer from two performance pitfalls: shared memory conflicts and thread divergence. We show that by using a set of CUDA tailored-made optimizations, we can improve on the speed of cuHE NTT computation by 20%-50% for different problem sizes.

[1]  Charles C. Weems,et al.  High Precision Integer Multiplication with a GPU Using Strassen's Algorithm with Multiple FFT Sizes , 2011, Parallel Process. Lett..

[2]  Berk Sunar,et al.  cuHE: A Homomorphic Encryption Accelerator Library , 2015, IACR Cryptol. ePrint Arch..

[3]  Michael Naehrig,et al.  Manual for Using Homomorphic Encryption for Bioinformatics , 2017, Proceedings of the IEEE.

[4]  Alessandro Cilardo,et al.  Securing the cloud with reconfigurable computing: An FPGA accelerator for homomorphic encryption , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Arnold Schönhage,et al.  Schnelle Multiplikation großer Zahlen , 1971, Computing.

[6]  Berk Sunar,et al.  Accelerating Fully Homomorphic Encryption in Hardware , 2015, IEEE Transactions on Computers.

[7]  J. Solinas CORR 99-39 Generalized Mersenne Numbers , 1999 .

[8]  Nicholas Wilt,et al.  The CUDA Handbook: A Comprehensive Guide to GPU Programming , 2013 .

[9]  Berk Sunar,et al.  Accelerating fully homomorphic encryption using GPU , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[10]  Vinod Vaikuntanathan,et al.  Can homomorphic encryption be practical? , 2011, CCSW '11.

[11]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[12]  Frederik Vercauteren,et al.  Somewhat Practical Fully Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[13]  Michael Naehrig,et al.  Accelerating Homomorphic Evaluation on Reconfigurable Hardware , 2015, CHES.

[14]  Xinming Huang,et al.  Accelerating leveled fully homomorphic encryption using GPU , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[15]  Ronald L. Rivest,et al.  ON DATA BANKS AND PRIVACY HOMOMORPHISMS , 1978 .