Low-Latency VLSI Architectures for Modular Polynomial Multiplication via Fast Filtering and Applications to Lattice-Based Cryptography

This paper presents a low-latency hardware accelerator for modular polynomial multiplication for lattice-based post-quantum cryptography and homomorphic encryption applications. The proposed novel modular polynomial multiplier exploits the fast finite impulse response (FIR) filter architecture to reduce the computational complexity for the schoolbook modular polynomial multiplication. We also extend this structure to fast M -parallel architectures while achieving low-latency, high-speed, and full hardware utilization. We comprehensively evaluate the performance of the proposed architectures under various polynomial settings as well as in the Saber scheme for post-quantum cryptography as a case study. The experimental results show that our design reduces the computational time and area-time product by 61% and 32%, respectively, compared to the state-of-the-art designs.

[1]  Arnaud Tisserand,et al.  Hardware/Software Co-Design of an Accelerator for FV Homomorphic Encryption Scheme Using Karatsuba Algorithm , 2018, IEEE Transactions on Computers.

[2]  Frederik Vercauteren,et al.  Saber: Module-LWR based key exchange, CPA-secure encryption and CCA-secure KEM , 2018, IACR Cryptol. ePrint Arch..

[3]  Keshab K. Parhi,et al.  Synthesis of control circuits in folded pipelined DSP architectures , 1992 .

[4]  Chenchen Deng,et al.  LWRpro: An Energy-Efficient Configurable Crypto-Processor for Module-LWR , 2021, IEEE Transactions on Circuits and Systems I: Regular Papers.

[5]  Ricardo Chaves,et al.  Efficient FPGA Implementation of the SHA-3 Hash Function , 2017, 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[6]  Yuqing Zhang,et al.  An Efficient and Parallel R-LWE Cryptoprocessor , 2020, IEEE Transactions on Circuits and Systems II: Express Briefs.

[7]  Peter Schwabe,et al.  Algorithm Speci cations And Supporting Documentation , 2019 .

[8]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[9]  Máire O'Neill,et al.  Lightweight Hardware Implementation of R-LWE Lattice-Based Cryptography , 2018, 2018 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS).

[10]  H. T. Kung Why systolic architectures? , 1982, Computer.

[11]  Damien Stehlé,et al.  CRYSTALS - Kyber: A CCA-Secure Module-Lattice-Based KEM , 2017, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[12]  Kris Gaj,et al.  High-Level Synthesis in Implementing and Benchmarking Number Theoretic Transform in Lattice-Based Post-Quantum Cryptography Using Software/Hardware Codesign , 2020, ARC.

[13]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[14]  Chris Peikert,et al.  On Ideal Lattices and Learning with Errors over Rings , 2010, JACM.

[15]  Xu Cheng,et al.  VPQC: A Domain-Specific Vector Processor for Post-Quantum Cryptography Based on RISC-V Architecture , 2020, IEEE Transactions on Circuits and Systems I: Regular Papers.

[16]  Shuguo Li,et al.  A Compact Hardware Implementation of CCA-Secure Key Exchange Mechanism CRYSTALS-KYBER on FPGA , 2021, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[17]  Keshab K. Parhi,et al.  VLSI digital signal processing systems , 1999 .

[18]  B. Barak Fully Homomorphic Encryption and Post Quantum Cryptography , 2010 .

[19]  Damien Stehlé,et al.  Classical hardness of learning with errors , 2013, STOC '13.

[20]  Keshab K. Parhi,et al.  Reduced-Complexity Modular Polynomial Multiplication for R-LWE Cryptosystems , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Anantha P. Chandrakasan,et al.  Sapphire: A Configurable Crypto-Processor for Post-Quantum Lattice-based Protocols , 2019, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[22]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[23]  FrodoKEM Learning With Errors Key Encapsulation Algorithm , 2017 .

[24]  Tatsuaki Okamoto,et al.  Secure Integration of Asymmetric and Symmetric Encryption Schemes , 1999, Journal of Cryptology.

[25]  Chen Chen,et al.  Highly Efficient Architecture of NewHope-NIST on FPGA using Low-Complexity NTT/INTT , 2020, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[26]  Keshab K. Parhi,et al.  Area-efficient parallel FIR digital filter implementations , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.

[27]  Paul Barrett,et al.  Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor , 1986, CRYPTO.

[28]  Keshab K. Parhi,et al.  Low-Area/Power Parallel FIR Digital Filter Implementations , 1997, J. VLSI Signal Process..

[29]  Tim Güneysu,et al.  Standard Lattice-Based Key Encapsulation on Embedded Devices , 2018, IACR Cryptol. ePrint Arch..

[30]  Ingrid Verbauwhede,et al.  Compact domain-specific co-processor for accelerating module lattice-based KEM , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[31]  Anatolij A. Karatsuba,et al.  Multiplication of Multidigit Numbers on Automata , 1963 .

[32]  Keshab K. Parhi,et al.  Hardware efficient fast parallel FIR filter structures based on iterated short convolution , 2004, IEEE Trans. Circuits Syst. I Regul. Pap..

[33]  Frederik Vercauteren,et al.  Somewhat Practical Fully Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[34]  Thomas Poppelmann,et al.  Area optimization of lightweight lattice-based encryption on reconfigurable hardware , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[35]  Andrea Basso,et al.  High-speed Instruction-set Coprocessor for Lattice-based Key Encapsulation Mechanism: Saber in Hardware , 2020, IACR Cryptol. ePrint Arch..

[36]  Tim Güneysu,et al.  Area optimization of lightweight lattice-based encryption on reconfigurable hardware , 2014, ISCAS.

[37]  Michal Andrzejczak,et al.  Implementing and Benchmarking Three Lattice-Based Post-Quantum Cryptography Algorithms Using Software/Hardware Codesign , 2019, 2019 International Conference on Field-Programmable Technology (ICFPT).

[38]  Frederik Vercauteren,et al.  University of Birmingham Hardware assisted fully homomorphic function evaluation and encrypted search , 2016 .

[39]  Ayesha Khalid,et al.  Optimized Schoolbook Polynomial Multiplication for Compact Lattice-Based Cryptography on FPGA , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[40]  T. Kailath,et al.  Array architectures for iterative algorithms , 1987, Proceedings of the IEEE.