An Efficient Architecture for Interleaved Modular Multiplication

Modular multiplication is the basic operation in most public key cryptosystems, such as RSA, DSA, ECC, and DH key exchange. Unfortunately, very large operands (in order of 1024 or 2048 bits) must be used to provide sufficient security strength. The use of such big numbers dramatically slows down the whole cipher system, especially when running on embedded processors. So far, customized hardware accelerators developed on FPGAs or ASICs were the best choice for accelerating modular multiplication in embedded environments. On the other hand, many algorithms have been developed to speed up such operations. Examples are the Montgomery modular multiplication and the interleaved modular multiplication algorithms. Combining both customized hardware with an efficient algorithm is expected to provide a much faster

[1]  Q. K. Kop,et al.  Fast algorithm for modular reduction , 1998 .

[2]  Christof Paar,et al.  Efficient hardware architectures for modular multiplication on FPGAs , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[3]  G. R. Blakley,et al.  A Computer Algorithm for Calculating the Product AB Modulo M , 1983, IEEE Trans. Computers.

[4]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[5]  Manfred Schimmler,et al.  Area and time efficient modular multiplication of large integers , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[6]  David Narh Amanor,et al.  Efficient Hardware Architectures for Modular Multiplication , 2005 .