Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography

Constant-time polynomial multiplication is one of the most time-consuming operations in many lattice-based cryptographic constructions. For schemes based on the hardness of Ring-LWE in power-of-two cyclotomic fields with completely splitting primes, the AVX2 optimized implementation of the Number-Theoretic Transform (NTT) from the NewHope key-exchange scheme is the state of the art for fast multiplication. It uses floating point vector instructions. We show that by using a modification of the Montgomery reduction algorithm that enables a fast approach with integer instructions, we can improve on the polynomial multiplication speeds of NewHope and Kyber by a factor of 4.2 and 6.3 on Skylake, respectively.