A High-performance Hardware Implementation of Saber Based on Karatsuba Algorithm

Although large numbers of hardware and software implementations have been proposed to accelerate lattice-based cryptography, Saber, a module-LWR-based algorithm, which has advanced to second round of the NIST standardization process, has not been adequately supported by the current solutions. Based on these motivations, a high-performance crypto-processor is proposed based on an algorithm-hardware co-design in this paper. First, a hierarchical Karatsuba calculating framework, a hardware-efficient Karatsuba scheduling strategy and an optimized circuit structure are utilized to enable high-throughput polynomial multiplication. Furthermore, a task-level pipeline and truncated multipliers are proposed to enable algorithm-specific fine-grained processing. Enabled by all of the above optimizations, our processor takes 943, 1156, and 408 clock cycles for key generation, encryption, and decryption, respectively. Enabled by these optimizations, our processor takes 943, 1156 and 408 clock cycles for key generation, encryption, and decryption of Saber768, achieving 5.4×, 5.2× and 4.2× reductions compared with the state-of-the-art FPGA solutions, respectively. The post-layout simulation of our design is implemented with TSMC 40 nm CMOS process within 0.35 mm2. The throughput for Saber768 is up to 346k encryption operations per second and the energy efficiency is 0.12 uJ/encryption while operating at 400 MHz, achieving nearly 52× improvement and 30× improvement, respectively compared with current PQC hardware solutions.

[1]  Sujoy Sinha Roy,et al.  SaberX4: High-throughput Software Implementationof Saber Key Encapsulation Mechanism , 2019, IACR Cryptol. ePrint Arch..

[2]  Ramesh Karri,et al.  NIST Post-Quantum Cryptography- A Hardware Evaluation Study , 2019, IACR Cryptol. ePrint Arch..

[3]  Anantha Chandrakasan,et al.  An energy-efficient reconfigurable DTLS cryptographic engine for End-to-End security in iot applications , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[4]  Wei Tang,et al.  LEIA: A 2.05mm2 140mW lattice encryption instruction accelerator in 40nm CMOS , 2018, 2018 IEEE Custom Integrated Circuits Conference (CICC).

[5]  Anantha Chandrakasan,et al.  2.3 An Energy-Efficient Configurable Lattice Cryptography Processor for the Quantum-Secure Internet of Things , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[6]  Frederik Vercauteren,et al.  Saber: Module-LWR based key exchange, CPA-secure encryption and CCA-secure KEM , 2018, IACR Cryptol. ePrint Arch..

[7]  Pramod Kumar Meher,et al.  Low Latency Systolic Montgomery Multiplier for Finite Field $GF(2^{m})$ Based on Pentanomials , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Farnoud Farahmand,et al.  Implementing and Benchmarking Seven Round 2 Lattice-Based Key Encapsulation Mechanisms Using a Software/Hardware Codesign Approach , 2019 .

[9]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[10]  Martha Johanna Sepúlveda,et al.  Efficient and Flexible Low-Power NTT for Lattice-Based Cryptography , 2019, 2019 IEEE International Symposium on Hardware Oriented Security and Trust (HOST).

[11]  Ingrid Verbauwhede,et al.  Compact domain-specific co-processor for accelerating module lattice-based key encapsulation mechanism , 2020, IACR Cryptol. ePrint Arch..

[12]  Pramod Kumar Meher,et al.  Systolic and Super-Systolic Multipliers for Finite Field $GF(2^{m})$ Based on Irreducible Trinomials , 2008, IEEE Transactions on Circuits and Systems I: Regular Papers.

[13]  Xu Cheng,et al.  VPQC: A Domain-Specific Vector Processor for Post-Quantum Cryptography Based on RISC-V Architecture , 2020, IEEE Transactions on Circuits and Systems I: Regular Papers.

[14]  Andrea Basso,et al.  High-speed Instruction-set Coprocessor for Lattice-based Key Encapsulation Mechanism: Saber in Hardware , 2020, IACR Cryptol. ePrint Arch..

[15]  Chiou-Yng Lee,et al.  Low-complexity bit-parallel systolic Montgomery multipliers for special classes of GF(2/sup m/) , 2005, IEEE Transactions on Computers.

[16]  Zhi-Hong Mao,et al.  Low-Latency High-Throughput Systolic Multipliers Over $GF(2^{m})$ for NIST Recommended Pentanomials , 2015, IEEE Transactions on Circuits and Systems I: Regular Papers.

[17]  Nikil D. Dutt,et al.  Domain-specific Accelerators for Ideal Lattice-based Public Key Protocols , 2018, IACR Cryptol. ePrint Arch..

[18]  Yi Wu,et al.  A Karatsuba Algorithm Based Accelerator for Pairing Computation , 2019, 2019 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC).