A fast digit based Montgomery multiplier designed for FPGAs with DSP resources

Abstract A fast Montgomery multiplier design utilizing the DSP resources in modern FPGAs is presented. In the proposed design, the operand size is the multiples of 528 bits and the digit size is 48 bits. The design has 48  ×  48 bit digit multipliers built from the DSP slices performing 24  ×  16 bit multiplications and a carry select accumulator built from the DSP slices performing 48 bit additions. The proposed Montgomery multiplier works iteratively. In each iteration, a digit of an operand is multiplied by the digits of the other, the result is accumulated, and reduced by Montgomery method. An iteration takes not one but eight cycles to keep the digit multiplier count low and save some hardware resources. The proposed design is implemented for Virtex-7 FPGAs. The performance results are comparable with the best results in the literature. Substantial savings in FPGA logic resources are obtained.

[1]  Anil Çelebi,et al.  A General Digit-Serial Architecture for Montgomery Modular Multiplication , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  J. McCanny,et al.  Modified Montgomery modular multiplication and RSA exponentiation techniques , 2004 .

[3]  Parviz Keshavarzi,et al.  High-Throughput Modular Multiplication and Exponentiation Algorithms Using Multibit-Scan–Multibit-Shift Technique , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Miguel Morales-Sandoval,et al.  Scalable GF(p) Montgomery multiplier based on a digit-digit computation approach , 2016, IET Comput. Digit. Tech..

[5]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[6]  Panu Hämäläinen,et al.  Design of a Compact Modular Exponentiation Accelerator for Modern FPGA Devices , 2006, 2006 World Automation Congress.

[7]  Ming-Der Shieh,et al.  Word-Based Montgomery Modular Multiplication Algorithm for Low-Latency Scalable Architectures , 2010, IEEE Transactions on Computers.

[8]  Shiann-Rong Kuang,et al.  Low-Cost High-Performance VLSI Architecture for Montgomery Modular Multiplication , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Chao Wu,et al.  Efficient FPGA implementation of modular multiplication based on Montgomery algorithm , 2016, Microprocess. Microsystems.

[10]  Paul Zbinden,et al.  Flexible FPGA-Based Architectures for Curve Point Multiplication over GF(p) , 2016, 2016 Euromicro Conference on Digital System Design (DSD).

[11]  Ming-Der Shieh,et al.  A New Modular Exponentiation Architecture for Efficient Design of RSA Cryptosystem , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  Çetin Kaya Koç,et al.  High-Radix Design of a Scalable Modular Multiplier , 2001, CHES.

[13]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[14]  N. Koblitz Elliptic curve cryptosystems , 1987 .

[15]  Whitfield Diffie,et al.  New Directions in Cryptography , 1976, IEEE Trans. Inf. Theory.

[16]  Sanu Mathew,et al.  An improved unified scalable radix-2 Montgomery multiplier , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[17]  Victor S. Miller,et al.  Use of Elliptic Curves in Cryptography , 1985, CRYPTO.

[18]  Çetin Kaya Koç,et al.  A Scalable Architecture for Modular Multiplication Based on Montgomery's Algorithm , 2003, IEEE Trans. Computers.

[19]  Tian-Sheuan Chang,et al.  A new RSA cryptosystem hardware design based on Montgomery's algorithm , 1998 .

[20]  Çetin Kaya Koç,et al.  A Scalable Architecture for Montgomery Multiplication , 1999, CHES.

[21]  Shiann-Rong Kuang,et al.  Energy-Efficient High-Throughput Montgomery Modular Multipliers for RSA Cryptosystems , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[22]  Bo Song,et al.  An RSA Encryption Hardware Algorithm Using a Single DSP Block and a Single Block RAM on the FPGA , 2010, 2010 First International Conference on Networking and Computing.

[23]  Tarek A. El-Ghazawi,et al.  New Hardware Architectures for Montgomery Modular Multiplication Algorithm , 2011, IEEE Transactions on Computers.

[24]  Frederik Vercauteren,et al.  Faster Interleaved Modular Multiplication Based on Barrett and Montgomery Reduction Methods , 2010, IEEE Transactions on Computers.

[25]  Ç. Koç,et al.  Incomplete reduction in modular arithmetic , 2002 .