A regular parallel RSA processor

High performance VLSI implementation of the RSA algorithm using the systolic array is presented. High-speed applications of RSA systems require parallel implementations of modular multipliers. Besides using the systolic architecture which is popular in hardware-based RSA systems, a block-based scheme is used to further eliminate global signals, with a pipelined bus to convey data globally. The control signals and intermediate results used for sequential multiplications are transmitted by shift registers. All signals, except for the clock signal, are limited in one block or between two adjacent blocks. A carry-save-adder structure is used for calculating the iterative step of the algorithm, which contributes to speed improvement and area saving. In addition, long modular multipliers suffer from the effect of large fanout. Novel architectures are proposed to eliminate the fanout bottleneck, which reduce the achievable minimum clock period of long modular multipliers. Compared to the original modular multiplier architecture with fanout bottleneck, the proposed architectures can achieve an increase of over 7% in throughput without increase in area. The Chinese remainder theorem (CRT) technique increases the decryption data rate by a factor of four. Two redundant blocks are added to adapt to the on-line partition of the multiplier and the variation of the length of P and Q in CRT mode.

[1]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[2]  Thomas Blum,et al.  Montgomery modular exponentiation on reconfigurable hardware , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[3]  Colin D. Walter,et al.  Hardware Implementation of Montgomery's Modular Multiplication Algorithm , 1993, IEEE Trans. Computers.

[4]  Mark Shand,et al.  Fast implementations of RSA cryptography , 1993, Proceedings of IEEE 11th Symposium on Computer Arithmetic.

[5]  C. D. Walter,et al.  Montgomery exponentiation needs no final subtractions , 1999 .

[6]  C. D. Walter,et al.  Systolic Modular Multiplication , 1993, IEEE Trans. Computers.

[7]  Máire O'Neill,et al.  A high-speed, low latency RSA decryption silicon core , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[8]  Chung-Hsien Wu,et al.  VLSI Design of RSA Cryptosystem Based on the Chinese Remainder Theorem , 2001, J. Inf. Sci. Eng..

[9]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[10]  William P. Marnane,et al.  Efficient architectures for implementing montgomery modular multiplication and RSA modular exponentiation on reconfigurable logic , 2002, FPGA '02.

[11]  Cheng-Wen Wu,et al.  Cellular-array modular multiplier for fast RSA public-key cryptosystem based on modified Booth's algorithm , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[12]  Donald E. Knuth,et al.  The art of computer programming. Vol.2: Seminumerical algorithms , 1981 .

[13]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[14]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[15]  M. McLoone,et al.  Fast Montgomery modular multiplication and RSA cryptographic processor architectures , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[16]  Jun Rim Choi,et al.  Two implementation methods of a 1024-bit RSA cryptoprocessor based on modified Montgomery algorithm , 2001, ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196).

[17]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .