Accelerating fully homomorphic encryption using GPU

As a major breakthrough, in 2009 Gentry introduced the first plausible construction of a fully homomorphic encryption (FHE) scheme. FHE allows the evaluation of arbitrary functions directly on encrypted data on untwisted servers. In 2010, Gentry and Halevi presented the first FHE implementation on an IBM x3500 server. However, this implementation remains impractical due to the high latency of encryption and recryption. The Gentry-Halevi (GH) FHE primitives utilize multi-million-bit modular multiplications and additions which are time-consuming tasks for a general purpose computer. In the GH-FHE implementation, the most computationally intensive arithmetic operation is modular multiplication. In this paper, the million-bit modular multiplication is computed in two steps. For large number multiplication, Strassen's FFT based algorithm is employed and accelerated on a graphics processing unit (GPU) through its massive parallelism. Subsequently, Barrett modular reduction algorithm is applied to implement modular reduction. As an experimental study, we implement the GH-FHE primitives for the small setting with a dimension of 2048 on NVIDIA C2050 GPU. The experimental results show the speedup factors of 7.68, 7.4 and 6.59 for encryption, decryption and recrypt respectively, when compared with the existing CPU implementation.

[1]  Paul Barrett,et al.  Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor , 1986, CRYPTO.

[2]  Diego Fasoli,et al.  Three Applications of GPU Computing in Neuroscience , 2012, Computing in Science & Engineering.

[3]  Arnold Schönhage,et al.  Schnelle Multiplikation großer Zahlen , 1971, Computing.

[4]  Dinesh Manocha,et al.  Fast and approximate stream mining of quantiles and frequencies using graphics processors , 2005, SIGMOD '05.

[5]  Craig Gentry,et al.  Implementing Gentry's Fully-Homomorphic Encryption Scheme , 2011, EUROCRYPT.

[6]  J. Solinas CORR 99-39 Generalized Mersenne Numbers , 1999 .

[7]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[8]  William P. Marnane,et al.  Efficient architectures for implementing montgomery modular multiplication and RSA modular exponentiation on reconfigurable logic , 2002, FPGA '02.

[9]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[10]  Yifeng Chen,et al.  Improving Performance of Matrix Multiplication and FFT on GPU , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[11]  Charles C. Weems,et al.  High Precision Integer Multiplication with a GPU Using Strassen's Algorithm with Multiple FFT Sizes , 2011, Parallel Process. Lett..

[12]  Arnaud Tisserand,et al.  Comparison of Modular Arithmetic Algorithms on GPUs , 2009, PARCO.

[13]  Ronald L. Rivest,et al.  ON DATA BANKS AND PRIVACY HOMOMORPHISMS , 1978 .

[14]  M. McLoone,et al.  Fast Montgomery modular multiplication and RSA cryptographic processor architectures , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[15]  Chris Peikert,et al.  An update on SIPHER (Scalable Implementation of Primitives for Homomorphic EncRyption) — FPGA implementation using Simulink , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[16]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.