Sapphire: A Configurable Crypto-Processor for Post-Quantum Lattice-based Protocols

Public key cryptography protocols, such as RSA and elliptic curve cryptography, will be rendered insecure by Shor’s algorithm when large-scale quantum computers are built. Cryptographers are working on quantum-resistant algorithms, and lattice-based cryptography has emerged as a prime candidate. However, high computational complexity of these algorithms makes it challenging to implement lattice-based protocols on low-power embedded devices. To address this challenge, we present Sapphire – a lattice cryptography processor with configurable parameters. Efficient sampling, with a SHA-3-based PRNG, provides two orders of magnitude energy savings; a single-port RAM-based number theoretic transform memory architecture is proposed, which provides 124k-gate area savings; while a low-power modular arithmetic unit accelerates polynomial computations. Our test chip was fabricated in TSMC 40nm low-power CMOS process, with the Sapphire cryptographic core occupying 0.28 mm2 area consisting of 106k logic gates and 40.25 KB SRAM. Sapphire can be programmed with custom instructions for polynomial arithmetic and sampling, and it is coupled with a low-power RISC-V micro-processor to demonstrate NIST Round 2 lattice-based CCA-secure key encapsulation and signature protocols Frodo, NewHope, qTESLA, CRYSTALS-Kyber and CRYSTALS-Dilithium, achieving up to an order of magnitude improvement in performance and energy-efficiency compared to state-of-the-art hardware implementations. All key building blocks of Sapphire are constant-time and secure against timing and simple power analysis side-channel attacks. We also discuss how masking-based DPA countermeasures can be implemented on the Sapphire core without any changes to the hardware.

[1]  Morris J. Dworkin,et al.  SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions , 2015 .

[2]  Wei Tang,et al.  LEIA: A 2.05mm2 140mW lattice encryption instruction accelerator in 40nm CMOS , 2018, 2018 IEEE Custom Integrated Circuits Conference (CICC).

[3]  Anantha Chandrakasan,et al.  2.3 An Energy-Efficient Configurable Lattice Cryptography Processor for the Quantum-Secure Internet of Things , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[4]  Erdem Alkim,et al.  NewHope without reconciliation , 2016, IACR Cryptol. ePrint Arch..

[5]  Abhishek Banerjee,et al.  Pseudorandom Functions and Lattices , 2012, EUROCRYPT.

[6]  Dong-Guk Han,et al.  Chosen ciphertext Simple Power Analysis on software 8-bit implementation of ring-LWE encryption , 2016, 2016 IEEE Asian Hardware-Oriented Security and Trust (AsianHOST).

[7]  Peter Schwabe,et al.  NaCl's Crypto_box in Hardware , 2015, CHES.

[8]  Damien Stehlé,et al.  CRYSTALS-Kyber Algorithm Specifications And Supporting Documentation , 2017 .

[9]  Nikil Dutt,et al.  Post-quantum Lattice-based Cryptography Implementations: A Survey , 2019 .

[10]  Oded Regev,et al.  On lattices, learning with errors, random linear codes, and cryptography , 2005, STOC '05.

[11]  Chris Peikert,et al.  On Ideal Lattices and Learning with Errors over Rings , 2010, JACM.

[12]  Patrick Longa,et al.  Speeding up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography , 2016, CANS.

[13]  Marshall C. Pease,et al.  An Adaptation of the Fast Fourier Transform for Parallel Processing , 1968, JACM.

[14]  Dirk Fox,et al.  Advanced Encryption Standard (AES) , 1999, Datenschutz und Datensicherheit.

[15]  Amir Moradi,et al.  Leakage assessment methodology , 2016, Journal of Cryptographic Engineering.

[16]  B. Preneel,et al.  Trivium Specifications ? , 2022 .

[17]  Frederik Vercauteren,et al.  High-Speed Polynomial Multiplication Architecture for Ring-LWE and SHE Cryptosystems , 2015, IEEE Transactions on Circuits and Systems I: Regular Papers.

[18]  Michael Orshansky,et al.  Binary Ring-LWE hardware with power side-channel countermeasures , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]  Jovan Dj. Golic,et al.  High-Speed True Random Number Generation with Logic Gates Only , 2007, CHES.

[20]  Thomas Prest,et al.  Gaussian Sampling in Lattice-Based Cryptography , 2015 .

[21]  Dogan Ibrahim The Nucleo-F411RE Development Board , 2019 .

[22]  Tancrède Lepoint,et al.  CRYSTALS-Dilithium Algorithm Specifications and Supporting Documentation , 2017 .

[23]  Stefan Mangard,et al.  Single-Trace Side-Channel Attacks on Masked Lattice-Based Encryption , 2017, CHES.

[24]  Tim Güneysu,et al.  Implementing the NewHope-Simple Key Exchange on Low-Cost FPGAs , 2017, LATINCRYPT.

[25]  H. Fujiwara,et al.  Which is the best dual-port SRAM in 45-nm process technology? — 8T, 10T single end, and 10T differential — , 2008, 2008 IEEE International Conference on Integrated Circuit Design and Technology and Tutorial.

[26]  Tim Güneysu,et al.  Practical CCA2-Secure and Masked Ring-LWE Implementation , 2018, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[27]  Martin R. Albrecht,et al.  On the concrete hardness of Learning with Errors , 2015, J. Math. Cryptol..

[28]  Siavash Bayat-Sarmadi,et al.  Post-Quantum Cryptoprocessors Optimized for Edge and Resource-Constrained Devices in IoT , 2019, IEEE Internet of Things Journal.

[29]  Oded Regev,et al.  Quantum computation and lattice problems , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[30]  Erdem Alkim,et al.  FrodoKEM Learning With Errors Key Encapsulation Algorithm Specifications And Supporting Documentation , 2019 .

[31]  Frederik Vercauteren,et al.  Compact Ring-LWE Cryptoprocessor , 2014, CHES.

[32]  Elisabeth Oswald,et al.  Fly, you fool! Faster Frodo for the ARM Cortex-M4 , 2018, IACR Cryptol. ePrint Arch..

[33]  Daniel Apon,et al.  Status report on the first round of the NIST post-quantum cryptography standardization process , 2019 .

[34]  Damien Stehlé,et al.  Classical hardness of learning with errors , 2013, STOC '13.

[35]  J. Pollard,et al.  The fast Fourier transform in a finite field , 1971 .

[36]  Damien Stehlé,et al.  Worst-case to average-case reductions for module lattices , 2014, Designs, Codes and Cryptography.

[37]  Frederik Vercauteren,et al.  Efficient software implementation of ring-LWE encryption , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[38]  Daniel Smith-Tone,et al.  Report on Post-Quantum Cryptography , 2016 .

[39]  Chaohui Du,et al.  Towards efficient polynomial multiplication for lattice-based cryptography , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[40]  Frederik Vercauteren,et al.  A masked ring-LWE implementation , 2015, IACR Cryptol. ePrint Arch..

[41]  Hui Lin,et al.  A Resource-Efficient and Side-Channel Secure Hardware Implementation of Ring-LWE Cryptographic Processor , 2019, IEEE Transactions on Circuits and Systems I: Regular Papers.

[42]  Martin R. Albrecht,et al.  NewHope Algorithm Specifications and Supporting Documentation , 2017 .

[43]  Shay Gueron,et al.  Speeding up R-LWE Post-quantum Key Exchange , 2016, NordSec.

[44]  Frederik Vercauteren,et al.  Additively Homomorphic Ring-LWE Masking , 2016, PQCrypto.

[45]  Erdem Alkim,et al.  NewHope on ARM Cortex-M , 2016, SPACE.

[46]  Simon Heron,et al.  Encryption: Advanced Encryption Standard (AES) , 2009 .

[47]  Christian Hanser,et al.  Implementing RLWE-based Schemes Using an RSA Co-Processor , 2018, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[48]  Rod Howell,et al.  Algorithms - A Top-Down Approach , 2003, Algorithms.

[49]  Peter W. Shor,et al.  Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer , 1995, SIAM Rev..

[50]  D. J. Bernstein Fast multiplication and its applications , 2008 .

[51]  Nikil D. Dutt,et al.  Domain-specific Accelerators for Ideal Lattice-based Public Key Protocols , 2018, IACR Cryptol. ePrint Arch..

[52]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[53]  Martha Johanna Sepúlveda,et al.  Efficient and Flexible Low-Power NTT for Lattice-Based Cryptography , 2019, 2019 IEEE International Symposium on Hardware Oriented Security and Trust (HOST).

[54]  Tatsuaki Okamoto,et al.  Secure Integration of Asymmetric and Symmetric Encryption Schemes , 1999, Journal of Cryptology.

[55]  Xianhui Lu,et al.  Preprocess-then-NTT Technique and Its Applications to KYBER and NEWHOPE , 2018, IACR Cryptol. ePrint Arch..

[56]  Oded Regev,et al.  On lattices, learning with errors, random linear codes, and cryptography , 2009, JACM.

[57]  Chen-Mou Cheng,et al.  High Performance Post-Quantum Key Exchange on FPGAs , 2021, J. Inf. Sci. Eng..

[58]  Cezar Reinbrecht,et al.  Towards Reliable and Secure Post-Quantum Co-Processors based on RISC-V , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[59]  Paul Barrett,et al.  Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor , 1986, CRYPTO.

[60]  Tim Güneysu,et al.  Standard Lattice-Based Key Encapsulation on Embedded Devices , 2018, IACR Cryptol. ePrint Arch..

[61]  Anantha Chandrakasan,et al.  An energy-efficient reconfigurable DTLS cryptographic engine for End-to-End security in iot applications , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[62]  Yunsup Lee,et al.  The RISC-V Instruction Set Manual , 2014 .

[63]  Máire O'Neill,et al.  Lattice-based cryptography: From reconfigurable hardware to ASIC , 2016, 2016 International Symposium on Integrated Circuits (ISIC).

[64]  Pankaj Rohatgi,et al.  Introduction to differential power analysis , 2011, Journal of Cryptographic Engineering.

[65]  Ramesh Karri,et al.  NIST Post-Quantum Cryptography- A Hardware Evaluation Study , 2019, IACR Cryptol. ePrint Arch..