CoHA-NTT: A Configurable Hardware Accelerator for NTT-based Polynomial Multiplication

In this paper, we introduce a configurable hardware architecture that can be used to generate unified and parametric NTT-based polynomial multipliers that support a wide range of parameters of lattice-based cryptographic schemes proposed for post-quantum cryptography. Both NTT and inverse NTT operations can be performed using the unified butterfly unit of our architecture, which constitutes the core building block in NTT operations. The multitude of this unit plays an essential role in achieving the performance goals of a specific application area or platform. To this end, the architecture takes the size of butterfly units as input and generates an efficient NTTbased polynomial multiplier hardware to achieve the desired throughput and area requirements. More specifically, the proposed hardware architecture provides run-time configurability for the scheme parameters and compile-time configurability for throughput and area requirements. This work presents the first architecture with both run-time and compile-time configurability for NTT-based polynomial multiplication operations to the best of our knowledge. The implementation results indicate that the advanced configurability has a negligible impact on the time and area of the proposed architecture and that its performance is on par with the state-of-the-art implementations in the literature, if not better. The proposed architecture comprises various subblocks such as modular multiplier and butterfly units, each of which can be of interest on its own for accelerating lattice-based cryptography. Thus, we provide the design rationale of each subblock and compare it with those in the literature, including our earlier works in terms of configurability and performance.

[1]  Kris Gaj,et al.  A High-Level Synthesis Approach to the Software/Hardware Codesign of NTT-Based Post-Quantum Cryptography Algorithms , 2019, 2019 International Conference on Field-Programmable Technology (ICFPT).

[2]  Robert J. McEliece,et al.  A public key cryptosystem based on algebraic coding theory , 1978 .

[3]  Damien Stehlé,et al.  CRYSTALS - Kyber: A CCA-Secure Module-Lattice-Based KEM , 2017, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[4]  Erdem Alkim,et al.  Post-quantum Key Exchange - A New Hope , 2016, USENIX Security Symposium.

[5]  Zhenfei Zhang,et al.  Falcon: Fast-Fourier Lattice-based Compact Signatures over NTRU , 2019 .

[6]  Anantha P. Chandrakasan,et al.  Sapphire: A Configurable Crypto-Processor for Post-Quantum Lattice-based Protocols , 2019, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[7]  Ingrid Verbauwhede,et al.  Masked Accelerators and Instruction Set Extensions for Post-Quantum Cryptography , 2021, IACR Cryptol. ePrint Arch..

[8]  Martha Johanna Sepúlveda,et al.  Efficient and Flexible Low-Power NTT for Lattice-Based Cryptography , 2019, 2019 IEEE International Symposium on Hardware Oriented Security and Trust (HOST).

[9]  Michela Becchi,et al.  A Flexible and Scalable NTT Hardware : Applications from Homomorphically Encrypted Deep Learning to Post-Quantum Cryptography , 2020, 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[10]  Frederik Vercauteren,et al.  FPGA-Based High-Performance Parallel Architecture for Homomorphic Computing on Encrypted Data , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[11]  Damien Stehlé,et al.  CRYSTALS-Dilithium: A Lattice-Based Digital Signature Scheme , 2018, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[12]  Ç. Koç,et al.  Incomplete reduction in modular arithmetic , 2002 .

[13]  Thomas Poppelmann,et al.  Area optimization of lightweight lattice-based encryption on reconfigurable hardware , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[14]  Nikil D. Dutt,et al.  Post-Quantum Lattice-Based Cryptography Implementations , 2019, ACM Comput. Surv..

[15]  Shuguo Li,et al.  A Compact Hardware Implementation of CCA-Secure Key Exchange Mechanism CRYSTALS-KYBER on FPGA , 2021, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[16]  Frederik Vercauteren,et al.  Compact Ring-LWE Cryptoprocessor , 2014, CHES.

[17]  Ahmet Can Mert,et al.  A Hardware Accelerator for Polynomial Multiplication Operation of CRYSTALS-KYBER PQC Scheme , 2021, 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[18]  Denisa O. C. Greconici,et al.  Compact Dilithium Implementations on Cortex-M3 and Cortex-M4 , 2020, IACR Cryptol. ePrint Arch..

[19]  Daniel Smith-Tone,et al.  Report on Post-Quantum Cryptography , 2016 .

[20]  Xu Cheng,et al.  VPQC: A Domain-Specific Vector Processor for Post-Quantum Cryptography Based on RISC-V Architecture , 2020, IEEE Transactions on Circuits and Systems I: Regular Papers.

[21]  Ahmet Can Mert,et al.  Low-Latency ASIC Algorithms of Modular Squaring of Large Integers for VDF Evaluation , 2022, IEEE Transactions on Computers.

[22]  Erkay Savas,et al.  Design and Implementation of a Fast and Scalable NTT-Based Polynomial Multiplier Architecture , 2019, 2019 22nd Euromicro Conference on Digital System Design (DSD).

[23]  Chen Chen,et al.  Highly Efficient Architecture of NewHope-NIST on FPGA using Low-Complexity NTT/INTT , 2020, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[24]  Martha Johanna Sepúlveda,et al.  RISQ-V: Tightly Coupled RISC-V Accelerators for Post-Quantum Cryptography , 2020, IACR Cryptol. ePrint Arch..

[25]  Kris Gaj,et al.  High-Level Synthesis in Implementing and Benchmarking Number Theoretic Transform in Lattice-Based Post-Quantum Cryptography Using Software/Hardware Codesign , 2020, ARC.

[26]  Aydin Aysu,et al.  An Extensive Study of Flexible Design Methods for the Number Theoretic Transform , 2020, IEEE Transactions on Computers.

[27]  Frederik Vercauteren,et al.  Saber: Module-LWR based key exchange, CPA-secure encryption and CCA-secure KEM , 2018, IACR Cryptol. ePrint Arch..

[28]  Zhe Liu,et al.  Efficient Ring-LWE Encryption on 8-Bit AVR Processors , 2015, CHES.

[29]  Erdem Alkim,et al.  ISA Extensions for Finite Field Arithmetic - Accelerating Kyber and NewHope on RISC-V , 2020, IACR Cryptol. ePrint Arch..

[30]  Gregor Seiler,et al.  Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography , 2018, IACR Cryptol. ePrint Arch..

[31]  Patrick Longa,et al.  Parameterized Hardware Accelerators for Lattice-Based Cryptography and Their Application to the HW/SW Co-Design of qTESLA , 2020 .

[32]  Shuhong Gao,et al.  An Ultra-Highly Parallel Polynomial Multiplier for the Bootstrapping Algorithm in a Fully Homomorphic Encryption Scheme , 2020, Journal of Signal Processing Systems.

[33]  Alan George,et al.  Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms , 2019 .

[34]  Erkay Savas,et al.  FPGA implementation of a run-time configurable NTT-based polynomial multiplication hardware , 2020, Microprocess. Microsystems.

[35]  Cezar Reinbrecht,et al.  Towards Reliable and Secure Post-Quantum Co-Processors based on RISC-V , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[36]  Tim Güneysu,et al.  High-Performance Ideal Lattice-Based Cryptography on 8-Bit ATxmega Microcontrollers , 2015, LATINCRYPT.

[37]  Vadim Lyubashevsky,et al.  NTTRU: Truly Fast NTRU Using NTT , 2019, IACR Cryptol. ePrint Arch..

[38]  Paulo S. L. M. Barreto,et al.  The Lattice-Based Digital Signature Scheme qTESLA , 2020, IACR Cryptol. ePrint Arch..