Architectural Support for Long Integer Modulo Arithmetic on Risc-Based Smart Cards

Various algorithms for public-key cryptography, such as the Rivest-Shamir-Adleman or Diffie-Hellman algorithms, are based on long integer arithmetic operations, most notably modulo multiplication. To be adequate for long-term security, the modulus should have a length of at least 1024 bits. Long integer arithmetic is difficult to implement efficiently in software, particularly on smart cards due to their constrained resources and relatively slow clock frequency. In this paper we investigate the potential of application-specific instruction set extensions for cryptographic workloads such as long integer arithmetic. We define two special instructions that carry out computations of the form a A—b + c + d, whereby a,b,c,d are single-precision words unsigned integers. These additional instructions can be executed on an optimized multiply/accumulate unit and therefore they are simple to incorporate into common RISC architectures such as the MIPS32. The proposed extensions cause almost no speed or area penalty since no extra functional units are required. Experimental results indicate that the inner-loop operation of a multiple-precision multiplication can be accelerated by a factor of almost 2. We also estimate the execution time of a 1024-bit modulo exponentiation assuming that these special instructions were made available. The presented concept is an alternative solution to a crypto co-processor, especially for multi-application smart cards e.g. Java cards with an embedded 32-bit RISC core.

[1]  Tolga Acar,et al.  Analyzing and comparing Montgomery multiplication algorithms , 1996, IEEE Micro.

[2]  Ruby B. Lee Accelerating multimedia with enhanced microprocessors , 1995, IEEE Micro.

[3]  Siva Sai Yerubandi,et al.  Differential Power Analysis , 2002 .

[4]  Ruby B. Lee,et al.  Efficient permutation instructions for fast software cryptography , 2001 .

[5]  N. Burgess,et al.  Implementing 1,024-bit RSA exponentiation on a 32-bit processor core , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[6]  Israel Koren Computer arithmetic algorithms , 1993 .

[7]  Albert Wang,et al.  Hardware/software instruction set configurability for system-on-chip processors , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[8]  Ricardo E. Gonzalez,et al.  Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.

[9]  J. Grossschadl Instruction set extension for long integer modulo arithmetic on RISC-based smart cards , 2002, 14th Symposium on Computer Architecture and High Performance Computing, 2002. Proceedings..

[10]  Vojin G. Oklobdzija,et al.  General data-path organization of a MAC unit for VLSI implementation of DSP processors , 1998, ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187).

[11]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[12]  Katherine M. Shelfer,et al.  Smart card evolution , 2002, CACM.

[13]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[14]  Michael Gschwind,et al.  Instruction set selection for ASIP design , 1999, Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99) (IEEE Cat. No.99TH8450).

[15]  C. D. Walter,et al.  MIST: An Efficient, Randomized Exponentiation Algorithm for Resisting Power Analysis , 2002, CT-RSA.

[16]  Whitfield Diffie,et al.  New Directions in Cryptography , 1976, IEEE Trans. Inf. Theory.

[17]  Claude E. Shannon,et al.  Communication theory of secrecy systems , 1949, Bell Syst. Tech. J..

[18]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[19]  Kayhan Kucukcakar An ASIP design methodology for embedded systems , 1999 .

[20]  Ruby B. Lee Multimedia extensions for general-purpose processors , 1997, 1997 IEEE Workshop on Signal Processing Systems. SiPS 97 Design and Implementation formerly VLSI Signal Processing.

[21]  Daniel M. Gordon,et al.  A Survey of Fast Exponentiation Methods , 1998, J. Algorithms.

[22]  Johann Großschädl,et al.  A single-cycle (32/spl times/32+32+64)-bit multiply/accumulate unit for digital signal processing and public-key cryptography , 2003, 10th IEEE International Conference on Electronics, Circuits and Systems, 2003. ICECS 2003. Proceedings of the 2003.

[23]  J. Quisquater,et al.  Fast decipherment algorithm for RSA public-key cryptosystem , 1982 .

[24]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .

[25]  Xiaoping Huang,et al.  A high-performance CMOS redundant binary multiplication-and-accumulation (MAC) unit , 1994 .

[26]  Nathalie Feyt,et al.  Hardware and Software Symbiosis Helps Smart Card Evolution , 2001, IEEE Micro.

[27]  Todd M. Austin,et al.  Architectural support for fast symmetric-key cryptography , 2000, SIGP.

[28]  Minkyu Song,et al.  Design of a high performance 32/spl times/32-bit multiplier with a novel sign select Booth encoder , 2001, ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196).

[29]  Ruby B. Lee,et al.  64-bit and multimedia extensions in the PA-RISC 2.0 architecture , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.

[30]  Rainer Leupers,et al.  Retargetable Code Generation Based on Structural Processor Description , 1998, Des. Autom. Embed. Syst..