Instruction set extension for long integer modulo arithmetic on RISC-based smart cards

Modulo multiplication of long integers (/spl ges/ 1024 bits) is the major operation of many public-key cryptosystems like RSA or Diffie-Hellman. The efficient implementation of modulo arithmetic is a challenging task, in particular on smart cards due to their constrained resources and relatively slow clock frequency. We present the concept of an application-specific instruction set extension (ISE) for long integer arithmetic. We introduce an optimized multiply-and-accumulate (MAC) unit that makes it possible to compute a/spl times/b+c+d with only one instruction, whereby a, b, c, d are single-precision words (unsigned integers). This additional instruction is simple to incorporate into common RISC architectures like the MIPS32. Experimental results show that the inner-product operation of a multiple-precision multiplication can be accelerated by a factor of two without increasing the processor's clock frequency. We also estimate the execution time of a 1024-bit modulo exponentiation assuming that this special MAC instruction was made available. The proposed ISE is an alternative solution to a crypto co-processor especially for multi-application smart cards (e.g., Java cards) with an embedded 32-bit RISC core.

[1]  Todd M. Austin,et al.  Architectural support for fast symmetric-key cryptography , 2000, SIGP.

[2]  Vojin G. Oklobdzija,et al.  General data-path organization of a MAC unit for VLSI implementation of DSP processors , 1998, ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187).

[3]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[4]  Albert Wang,et al.  Hardware/software instruction set configurability for system-on-chip processors , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[5]  Nathalie Feyt,et al.  Hardware and Software Symbiosis Helps Smart Card Evolution , 2001, IEEE Micro.

[6]  Ruby B. Lee,et al.  Efficient permutation instructions for fast software cryptography , 2001 .

[7]  Ricardo E. Gonzalez,et al.  Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.

[8]  Ruby B. Lee Accelerating multimedia with enhanced microprocessors , 1995, IEEE Micro.

[9]  Tolga Acar,et al.  Analyzing and comparing Montgomery multiplication algorithms , 1996, IEEE Micro.

[10]  Israel Koren Computer arithmetic algorithms , 1993 .

[11]  Daniel M. Gordon,et al.  A Survey of Fast Exponentiation Methods , 1998, J. Algorithms.

[12]  Xiaoping Huang,et al.  A high-performance CMOS redundant binary multiplication-and-accumulation (MAC) unit , 1994 .

[13]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[14]  Kayhan Kucukcakar An ASIP design methodology for embedded systems , 1999 .

[15]  Ruby B. Lee Multimedia extensions for general-purpose processors , 1997, 1997 IEEE Workshop on Signal Processing Systems. SiPS 97 Design and Implementation formerly VLSI Signal Processing.

[16]  Douglas R. Stinson,et al.  Cryptography: Theory and Practice,Second Edition , 2002 .

[17]  Ruby B. Lee,et al.  64-bit and multimedia extensions in the PA-RISC 2.0 architecture , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.

[18]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[19]  Michael Gschwind,et al.  Instruction set selection for ASIP design , 1999, Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99) (IEEE Cat. No.99TH8450).