Faster Multiplication in \mathbb Z_2^m[x] on Cortex-M4 to Speed up NIST PQC Candidates

In this paper we optimize multiplication of polynomials in \(\mathbb {Z}_{2^m}[x]\) on the ARM Cortex-M4 microprocessor. We use these optimized multiplication routines to speed up the NIST post-quantum candidates RLizard, NTRU-HRSS, NTRUEncrypt, Saber, and Kindi. For most of those schemes the only previous implementation that executes on the Cortex-M4 is the reference implementation submitted to NIST; for some of those schemes our optimized software is more than factor of 20 faster. One of the schemes, namely Saber, has been optimized on the Cortex-M4 in a CHES 2018 paper; the multiplication routine for Saber we present here outperforms the multiplication from that paper by 42%, yielding speedups of \(22\%\) for key generation, \(20\%\) for encapsulation and \(22\%\) for decapsulation. Out of the five schemes optimized in this paper, the best performance for encapsulation and decapsulation is achieved by NTRU-HRSS. Specifically, encapsulation takes just over \(400\,000\) cycles, which is more than twice as fast as for any other NIST candidate that has previously been optimized on the ARM Cortex-M4.

[1]  Joseph H. Silverman,et al.  NTRU: A Ring-Based Public Key Cryptosystem , 1998, ANTS.

[2]  Peter Schwabe,et al.  All the AES You Need on Cortex-M3 and M4 , 2016, SAC.

[3]  Erdem Alkim,et al.  NewHope on ARM Cortex-M , 2016, SPACE.

[4]  Morris J. Dworkin,et al.  SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions , 2015 .

[5]  Abhishek Banerjee,et al.  Pseudorandom Functions and Lattices , 2012, EUROCRYPT.

[6]  Christof Paar,et al.  Generalizations of the Karatsuba Algorithm for Efficient Implementations , 2006, IACR Cryptol. ePrint Arch..

[7]  Erdem Alkim,et al.  Post-quantum Key Exchange - A New Hope , 2016, USENIX Security Symposium.

[8]  Ingrid Verbauwhede,et al.  Saber on ARM CCA-secure module lattice-based key encapsulation on ARM , 2018, IACR Cryptol. ePrint Arch..

[9]  Tatsuaki Okamoto,et al.  Secure Integration of Asymmetric and Symmetric Encryption Schemes , 1999, CRYPTO.

[10]  Anatolij A. Karatsuba,et al.  Multiplication of Multidigit Numbers on Automata , 1963 .

[11]  Alexander W. Dent,et al.  A Designer's Guide to KEMs , 2003, IMACC.

[12]  Peter Schwabe,et al.  High-speed key encapsulation from NTRU , 2017, IACR Cryptol. ePrint Arch..

[13]  Silas Richelson,et al.  On the Hardness of Learning with Rounding over Small Modulus , 2016, TCC.

[14]  William Whyte,et al.  NAEP: Provable Security in the Presence of Decryption Failures , 2003, IACR Cryptol. ePrint Arch..

[15]  William Whyte,et al.  Choosing Parameters for NTRUEncrypt , 2017, CT-RSA.

[16]  S. Cook,et al.  ON THE MINIMUM COMPUTATION TIME OF FUNCTIONS , 1969 .

[17]  Óscar García-Morchón,et al.  Round5: Compact and Fast Post-Quantum Public-Key Encryption , 2019, IACR Cryptol. ePrint Arch..

[18]  Eike Kiltz,et al.  A Modular Analysis of the Fujisaki-Okamoto Transformation , 2017, TCC.