A Flexible Architecture for Modular Arithmetic Hardware Accelerators based on RNS

Modular arithmetic is a building block for a variety of applications potentially supported on embedded systems. An approach to turn modular arithmetic more efficient is to identify algorithmic modifications that would enhance the parallelization of the target arithmetic in order to exploit the properties of parallel devices and platforms. The Residue Number System (RNS) introduces data-level parallelism, enabling the parallelization even for algorithms based on modular arithmetic with several data dependencies. However, the mapping of generic algorithms to full RNS-based implementations can be complex and the utilization of suitable hardware architectures that are scalable and adaptable to different demands is required. This paper proposes and discusses an architecture with scalability features for the parallel implementation of algorithms relying on modular arithmetic fully supported by the Residue Number System (RNS). The systematic mapping of a generic modular arithmetic algorithm to the architecture is presented. It can be applied as a high level synthesis step for an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA) design flow targeting modular arithmetic algorithms. An implementation with the Xilinx Virtex 4 and Altera Stratix II Field Programmable Gate Array (FPGA) technologies of the modular exponentiation and Elliptic Curve (EC) point multiplication, used in the Rivest-Shamir-Adleman (RSA) and (EC) cryptographic algorithms, suggests latency results in the same order of magnitude of the fastest hardware implementations of these operations known to date.

[1]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[2]  Tim Güneysu,et al.  Ultra High Performance ECC over NIST Primes on Commercial FPGAs , 2008, CHES.

[3]  Thanos Stouraitis,et al.  An RNS Implementation of an $F_{p}$ Elliptic Curve Point Multiplier , 2009, IEEE Transactions on Circuits and Systems I: Regular Papers.

[4]  Atsushi Shimbo,et al.  Cox-Rower Architecture for Fast Parallel Montgomery Multiplication , 2000, EUROCRYPT.

[5]  Ramdas Kumaresan,et al.  Fast Base Extension Using a Redundant Modulus in RNS , 1989, IEEE Trans. Computers.

[6]  Leonel Sousa,et al.  Modular Arithmetic Implementation with the Residue Number System ( RNS ) , 2012 .

[7]  Leonel Sousa,et al.  RNS-Based Elliptic Curve Point Multiplication for Massive Parallel Architectures , 2012, Comput. J..

[8]  Daisuke Suzuki,et al.  How to Maximize the Potential of FPGA Resources for Modular Exponentiation , 2007, CHES.

[9]  Jean-Claude Bajard,et al.  Modular multiplication and base extensions in residue number systems , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[10]  Ricardo Chaves,et al.  Compact and Flexible Microcoded Elliptic Curve Processor for Reconfigurable Devices , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[11]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[12]  Leonel Sousa,et al.  Elliptic Curve point multiplication on GPUs , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[13]  Tim Güneysu,et al.  Exploiting the Power of GPUs for Asymmetric Cryptography , 2008, CHES.

[14]  Nicolas Guillermin A High Speed Coprocessor for Elliptic Curve Scalar Multiplications over \mathbbFp\mathbb{F}_p , 2010, CHES.

[15]  Leonel Sousa,et al.  The CRNS framework and its application to programmable and reconfigurable cryptography , 2013, TACO.

[16]  Atsushi Shimbo,et al.  Implementation of RSA Algorithm Based on RNS Montgomery Multiplication , 2001, CHES.