Software Implementation of Modular Exponentiation, Using Advanced Vector Instructions Architectures

This paper describes an algorithm for computing modular exponentiation using vector (SIMD) instructions. It demonstrates, for the first time, how such a software approach can outperform the classical scalar (ALU) implementations, on the high end x86_64 platforms, if they have a wide SIMD architecture. Here, we target speeding up RSA2048 on Intel's soon-to-arrive platforms that support the AVX2 instruction set. To this end, we applied our algorithm and generated an optimized AVX2-based software implementation of 1024-bit modular exponentiation. This implementation is seamlessly integrated into OpenSSL, by patching over OpenSSL 1.0.1. Our results show that our implementation requires 51% less instructions than the current OpenSSL 1.0.1 implementation. This illustrates the potential significant speedup in the RSA2048 performance, which is expected in the coming (2013) Intel processors. The impact of such speedup on servers is noticeable, especially since migration to RSA2048 is recommended by NIST, starting from 2013.

[1]  C. D. Walter,et al.  Montgomery exponentiation needs no final subtractions , 1999 .

[2]  Nigel P. Smart,et al.  Parallel cryptographic arithmetic using a redundant Montgomery representation , 2004, IEEE Transactions on Computers.

[3]  Christof Paar,et al.  Cryptographic Hardware and Embedded Systems - CHES 2002 , 2003, Lecture Notes in Computer Science.

[4]  Daniel J. Bernstein,et al.  Curve25519: New Diffie-Hellman Speed Records , 2006, Public Key Cryptography.

[5]  Bart Preneel,et al.  Topics in Cryptology — CT-RSA 2002 , 2002, Lecture Notes in Computer Science.

[6]  Saleh Omran,et al.  A Review of SIMD Multimedia Extensions and their Usage in Scientific and Engineering Applications , 2008, Comput. J..

[7]  C. D. Walter,et al.  Montgomery's Multiplication Technique: How to Make It Smaller and Faster , 1999, CHES.

[8]  Tolga Acar,et al.  Analyzing and comparing Montgomery multiplication algorithms , 1996, IEEE Micro.

[9]  Shay Gueron,et al.  Efficient software implementations of modular exponentiation , 2012, Journal of Cryptographic Engineering.

[10]  Richard P. Brent,et al.  Modern Computer Arithmetic , 2010 .

[11]  Shay Gueron Enhanced Montgomery Multiplication , 2002, CHES.

[12]  Shay Gueron,et al.  Speeding Up Big-Numbers Squaring , 2012, 2012 Ninth International Conference on Information Technology - New Generations.

[13]  Colin D. Walter Precise Bounds for Montgomery Modular Multiplication and Some Potentially Insecure RSA Moduli , 2002, CT-RSA.

[14]  Elaine B. Barker,et al.  SP 800-131A. Transitions: Recommendation for Transitioning the Use of Cryptographic Algorithms and Key Lengths , 2011 .