OSKR/OKAI: Systematic Optimization of Key Encapsulation Mechanisms from Module Lattice

In this work, we make systematic optimizations of key encapsulation mechanisms (KEM) based on module learning-with-errors (MLWE), covering algorithmic design, fundamental operation of number-theoretic transform (NTT), approaches to expanding encapsulated key size, and optimized implementation coding. We focus on Kyber (now in the Round-3 finalist of NIST PQC standardization) and Aigis (a variant of Kyber proposed at PKC 2020). By careful analysis, we first observe that the algorithmic design of Kyber and Aigis can be optimized by the mechanism of asymmetric key consensus with noise (AKCN) proposed in [12,13]. Specifically, the decryption process can be simplified with AKCN, leading to a both faster and less error-prone decryption process. Moreover, the AKCN-based optimized version has perfect compatibility with the deployment of Kyber/Aigis in reality, as they can run on the same parameters, the same public key, and the same encryption process. We make a systematic study of the variants of NTT proposed in recent years for extending its applicability scope, make concrete analysis of their exact computational complexity, and in particular show their equivalence. We then present a new variant named hybrid-NTT (H-NTT), combining the advantages of existing NTT methods, and derive its optimality in computational complexity. The H-NTT technique not only has larger applicability scope but also allows for modular and unified implementation codes of NTT operations even with varying module dimensions. We analyze and compare the different approaches to expand the size of key to be encapsulated (specifically, 512-bit key for dimension of 1024), and conclude with the most economic approach. To mitigate the compatibility issue in implementations we adopt the proposed H-NTT method. Each of the above optimization techniques is of independent value, and we apply all of them to Kyber and Aigis, resulting in new protocol variants named OSKR and OKAI respectively. For all the new protocol variants proposed in this work, we provide both AVX2 and ARM Cortex-M4 implementations, and present the performance benchmarks. Through thorough implementation optimizations, our AVX2 implementation gains efficiency improvement by 17.39% compared to Kyber-512, by 11.31% to Kyber-768, and by 34.26% to Kyber-1024. Meanwhile, our work shows 53.96%, 25.00%, and 49.08% improvement in speed and 82.57% reduction in pre-computed root storage compared to Aigis. Also, to the best of our knowledge, our work is the first that presents ARM Cortex-M4 implementations for the variants of Aigis.

[1]  Shay Gueron,et al.  Speeding up R-LWE Post-quantum Key Exchange , 2016, NordSec.

[2]  Morris J. Dworkin,et al.  SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions , 2015 .

[3]  Damien Stehlé,et al.  CRYSTALS - Kyber: A CCA-Secure Module-Lattice-Based KEM , 2017, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[4]  Christof Paar,et al.  Generalizations of the Karatsuba Algorithm for Efficient Implementations , 2006, IACR Cryptol. ePrint Arch..

[5]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[6]  Erdem Alkim,et al.  Compact and Simple RLWE Based Key Encapsulation Mechanism , 2019, LATINCRYPT.

[7]  Paul Barrett,et al.  Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor , 1986, CRYPTO.

[8]  Zhengzhong Jin,et al.  Optimal Key Consensus in Presence of Noise , 2016, IACR Cryptol. ePrint Arch..

[9]  S. Cook,et al.  ON THE MINIMUM COMPUTATION TIME OF FUNCTIONS , 1969 .

[10]  Damien Stehlé,et al.  Worst-case to average-case reductions for module lattices , 2014, Designs, Codes and Cryptography.

[11]  Chris Peikert,et al.  On Ideal Lattices and Learning with Errors over Rings , 2010, JACM.

[12]  Oded Regev,et al.  On lattices, learning with errors, random linear codes, and cryptography , 2005, STOC '05.

[13]  Eric Rescorla,et al.  The Transport Layer Security (TLS) Protocol Version 1.1 , 2006, RFC.

[14]  Zhen Liu,et al.  When NTT Meets Karatsuba: Preprocess-then-NTT Technique Revisited , 2019, IACR Cryptol. ePrint Arch..

[15]  Henri Cohen,et al.  A course in computational algebraic number theory , 1993, Graduate texts in mathematics.

[16]  Zhengzhong Jin,et al.  Generic and Practical Key Establishment from Lattice , 2019, ACNS.

[17]  Xianhui Lu,et al.  Preprocess-then-NTT Technique and Its Applications to KYBER and NEWHOPE , 2018, IACR Cryptol. ePrint Arch..

[18]  Nikil Dutt,et al.  Post-quantum Lattice-based Cryptography Implementations: A Survey , 2019 .

[19]  Martin Roetteler,et al.  Implementing Grover Oracles for Quantum Key Search on AES and LowMC , 2019, IACR Cryptol. ePrint Arch..

[20]  Damien Stehlé,et al.  CRYSTALS-Kyber Algorithm Specifications And Supporting Documentation , 2017 .

[21]  Zhenfeng Zhang,et al.  Tweaking the Asymmetry of Asymmetric-Key Cryptography on Lattices: KEMs and Signatures of Smaller Sizes , 2020, IACR Cryptol. ePrint Arch..

[22]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .