Cortex-M4 Optimizations for \{R, M\}LWE Schemes

This paper proposes various optimizations for lattice-based key-encapsulation mechanisms (KEM) using the Number Theoretic Transform (NTT) on the popular ARM Cortex-M4 microcontroller. Improvements come in the form of a faster code using more efficient modular reductions, small polynomial multiplications and more aggressive layer merging in the NTT but also reduced stack usage. We test those optimizations in software implementations of Kyber and NewHope, both round 2 candidates in the NIST post-quantum project and also NewHope-Compact, a recently proposed derivative of NewHope with smaller parameters. Our software is the first implementation of NewHope-Compact on Cortex-M4 and shows speed improvements over previous high-speed implementations on the same platform for Kyber and NewHope. Moreover, it gives a common framework to compare those algorithms with the same level of optimization. Our results show that NewHope-Compact is the faster algorithm, followed by Kyber and finally NewHope that seems to suffer from its large modulus and error distribution for small dimensions.

[1]  Erdem Alkim,et al.  Compact and Simple RLWE Based Key Encapsulation Mechanism , 2019, LATINCRYPT.

[2]  Erdem Alkim,et al.  NewHope without reconciliation , 2016, IACR Cryptol. ePrint Arch..

[3]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[4]  Damien Stehlé,et al.  CRYSTALS-Kyber Algorithm Specifications And Supporting Documentation , 2017 .

[5]  Vadim Lyubashevsky,et al.  NTTRU: Truly Fast NTRU Using NTT , 2019, IACR Cryptol. ePrint Arch..

[6]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[7]  Chris Peikert,et al.  On Ideal Lattices and Learning with Errors over Rings , 2010, JACM.

[8]  Christof Paar,et al.  Generalizations of the Karatsuba Algorithm for Efficient Implementations , 2006, IACR Cryptol. ePrint Arch..

[9]  Erdem Alkim,et al.  Post-quantum Key Exchange - A New Hope , 2016, USENIX Security Symposium.

[10]  Gregor Seiler,et al.  Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography , 2018, IACR Cryptol. ePrint Arch..

[11]  Paul Barrett,et al.  Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor , 1986, CRYPTO.

[12]  Peter Schwabe,et al.  Faster Multiplication in \mathbb Z_2^m[x] on Cortex-M4 to Speed up NIST PQC Candidates , 2019, ACNS.

[13]  Tatsuaki Okamoto,et al.  Secure Integration of Asymmetric and Symmetric Encryption Schemes , 1999, Journal of Cryptology.

[14]  Xianhui Lu,et al.  Preprocess-then-NTT Technique and Its Applications to KYBER and NEWHOPE , 2018, IACR Cryptol. ePrint Arch..

[15]  Elisabeth Oswald,et al.  Fly, you fool! Faster Frodo for the ARM Cortex-M4 , 2018, IACR Cryptol. ePrint Arch..

[16]  W. M. Gentleman,et al.  Fast Fourier Transforms: for fun and profit , 1966, AFIPS '66 (Fall).

[17]  Damien Stehlé,et al.  Worst-case to average-case reductions for module lattices , 2014, Designs, Codes and Cryptography.

[18]  Ingrid Verbauwhede,et al.  Saber on ARM CCA-secure module lattice-based key encapsulation on ARM , 2018, IACR Cryptol. ePrint Arch..

[19]  Damien Stehlé,et al.  CRYSTALS - Kyber: A CCA-Secure Module-Lattice-Based KEM , 2017, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[20]  Erdem Alkim,et al.  NewHope on ARM Cortex-M , 2016, SPACE.

[21]  Erdem Alkim,et al.  ISA Extensions for Finite Field Arithmetic - Accelerating Kyber and NewHope on RISC-V , 2020, IACR Cryptol. ePrint Arch..

[22]  Óscar García-Morchón,et al.  Shorter Messages and Faster Post-Quantum Encryption with Round5 on Cortex M , 2018, IACR Cryptol. ePrint Arch..

[23]  Martin R. Albrecht,et al.  NewHope Algorithm Specifications and Supporting Documentation , 2017 .

[24]  Peter Schwabe,et al.  Memory-Efficient High-Speed Implementation of Kyber on Cortex-M4 , 2019, IACR Cryptol. ePrint Arch..