Efficient polynomial multiplier architecture for Ring-LWE based public key cryptosystems

The most critical and computationally intensive operation of Ring-LWE based public key cryptosystems is polynomial multiplication. In this paper, we introduce several optimization techniques to speed up polynomial multiplication with the number theoretic transform (NTT). We propose to pre-compute N TT of the constant polynomial to reduce the number of NTT computations. In order to reduce the cost of bit-reverse operation, a optimization technique is introduced to perform it on-the-fly. We also present a technique to improve the utilization rate of the butterfly operator. Moreover, the cancellation lemma is exploited to reduce the required ROM storage. Based on these optimizations, we present a versatile pipelined polynomial multiplication architecture, which takes around (n lg n + 1.5n) clock cycles to calculate the product of two n-degree polynomials. Experimental results on a Spartan-6 FPGA show that our polynomial multiplier achieves a speedup of 2.04 on average and consumes less hardware resources when compared with the state of art of efficient implementation.