Montgomery Modular Multiplication Algorithm on Multi-Core Systems

In this paper, we investigate the efficient software implementations of theMontgomery modular multiplication algorithm on amulti-core system. AHW/SW co-design technique is used to find the efficient system architecture and the instruction scheduling method. We first implement the Montgomery modular multiplication on a multi-core systemwith general purpose cores. We then speed up it by adopting the Multiply-Accumulate (MAC) operation in each core. As a result, the performance can be improved by a factor of 1.53 and 2.15 when 256-bit and 1024-bit Montgomery modular multiplication being performed, respectively.

[1]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[2]  Ingrid Verbauwhede,et al.  Efficient pipelining for modular multiplication architectures in prime fields , 2007, GLSVLSI '07.

[3]  Çetin Kaya Koç,et al.  A Scalable Architecture for Modular Multiplication Based on Montgomery's Algorithm , 2003, IEEE Trans. Computers.

[4]  Nele Mentens,et al.  Secure and efficient coprocessor design for cryptographic applications on FPGAs , 2007 .

[5]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[6]  Patrick Schaumont,et al.  Interactive cosimulation with partial evaluation , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[7]  Atsuko Miyaji,et al.  Efficient Elliptic Curve Exponentiation Using Mixed Coordinates , 1998, ASIACRYPT.

[8]  Victor S. Miller,et al.  Use of Elliptic Curves in Cryptography , 1985, CRYPTO.

[9]  Lejla Batina,et al.  Montgomery in Practice: How to Do It More Efficiently in Hardware , 2002, CT-RSA.

[10]  Naofumi Takagi,et al.  Bipartite Modular Multiplication , 2005, CHES.

[11]  Ian F. Blake,et al.  Elliptic curves in cryptography , 1999 .

[12]  T. Hattori,et al.  Hierarchical Power Distribution with 20 Power Domains in 90-nm Low-Power Multi-CPU Processor , 2007, 2007 IEEE International Conference on Integrated Circuit Design and Technology.

[13]  C. D. Walter,et al.  Montgomery exponentiation needs no final subtractions , 1999 .

[14]  Hideki Imai,et al.  High-Speed Implementation Methods for RSA Scheme , 1992, EUROCRYPT.

[15]  Kouichi Itoh,et al.  Fast Implementation of Public-Key Cryptography ona DSP TMS320C6201 , 1999, CHES.

[16]  Tolga Acar,et al.  Analyzing and comparing Montgomery multiplication algorithms , 1996, IEEE Micro.

[17]  Philip Heng Wai Leong,et al.  Modular exponentiation using parallel multipliers , 2003, Proceedings. 2003 IEEE International Conference on Field-Programmable Technology (FPT) (IEEE Cat. No.03EX798).

[18]  Alfred Menezes,et al.  Software Implementation of the NIST Elliptic Curves Over Prime Fields , 2001, CT-RSA.

[19]  N. Koblitz Elliptic curve cryptosystems , 1987 .

[20]  Kazuhiro Yokoyama,et al.  Elliptic curve cryptosystem , 2000 .

[21]  Thomas Blum,et al.  Montgomery modular exponentiation on reconfigurable hardware , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[22]  Johann Großschädl,et al.  Architectural Enhancements to Support Digital Signal Processing and Public-Key Cryptography , 2004, WISES.

[23]  Ingrid Verbauwhede,et al.  A fast dual-field modular arithmetic logic unit and its hardware implementation , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[24]  E EldridgeStephen,et al.  Hardware Implementation of Montgomery's Modular Multiplication Algorithm , 1993 .

[25]  D. Harris,et al.  Parallelized Very High Radix Scalable Montgomery Multipliers , 2005, Conference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005..

[26]  Hans Eberle,et al.  Comparing Elliptic Curve Cryptography and RSA on 8-bit CPUs , 2004, CHES.