Performance-Scalable Array Architectures for Modular Multiplication

Modular multiplication is a fundamental operation in numerous public-key cryptosystems including the RSA method. Increasing popularity of internet e-commerce and other security applications translate into a demand for a scalable performance hardware design framework. Previous scalable hardware methodologies either were not systolic and thus involved performance-degrading, full-word-length broadcasts or were not scalable beyond linear array size. In this paper, these limitations are overcome with the introduction of three classes of scalable-performance modular multiplication architectures based on systolic arrays. Very high clock rates are feasible, since the cells composing the architectures are of bit-level complexity. Architectural methods based on both binary and high-radix modular multiplication are derived. All techniques are constructed to allow additional flexibility for the impact of interconnect delay within the design environment.

[1]  C. D. Walter,et al.  Systolic Modular Multiplication , 1993, IEEE Trans. Computers.

[2]  Blakely A Computer Algorithm for Calculating the Product AB Modulo M , 1983, IEEE Transactions on Computers.

[3]  Peter Kornerup,et al.  A Systolic, Linear-Array Multiplier for a Class of Right-Shift Algorithms , 1994, IEEE Trans. Computers.

[4]  Colin D. Walter Space/Time Trade-Offs for Higher Radix Modular Multiplication Using Repeated Addition , 1997, IEEE Trans. Computers.

[5]  Naofumi Takagi,et al.  A Radix-4 Modular Multiplication Hardware Algorithm for Modular Exponentiation , 1992, IEEE Trans. Computers.

[6]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[7]  Keshab K. Parhi,et al.  High-level algorithm and architecture transformations for DSP synthesis , 1995, J. VLSI Signal Process..

[8]  Keshab K. Parhi,et al.  A unified method for iterative computation of modular multiplication and reduction operations , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[9]  Cheng-Wen Wu,et al.  An improved Montgomery's algorithm for high-speed RSA public-key cryptosystem , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[10]  Jürgen Teich,et al.  Partitioning of processor arrays: a piecewise regular approach , 1993, Integr..

[11]  Wei-Chang Tsai,et al.  Two systolic architectures for modular multiplication , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[12]  Chin-Liang Wang,et al.  A novel digit-serial systolic array for modular multiplication , 1998, ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187).

[13]  Wayne P. Burleson,et al.  VLSI array algorithms and architectures for RSA modular multiplication , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[14]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[15]  Jürgen Teich,et al.  Scheduling of partitioned regular algorithms on processor arrays with constrained resources , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.

[16]  Çetin Kaya Koç,et al.  A Scalable Architecture for Montgomery Multiplication , 1999, CHES.

[17]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[18]  Holger Orup,et al.  Simplifying quotient determination in high-radix modular multiplication , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.