Modular Arithmetic Implementation with the Residue Number System ( RNS )

where SX is the size of X in bits and Xj is the j-th word of the binary representation of X split in k-bit words. Considering (1), in order to obtain the RNS representation of X, the constants 2 mod me,i can be precomputed and stored in the accelerator and the conversion is accomplished with a few RNS channel modular multiply-and-accumulate operations. For the reverse conversion, which means converting the computation results back to the binary representation, several approaches exist [3, 5, 7]. In the proposed framework, the conversion presented in [5] is adopted since it is suggested to be less complex and require lower dynamic ranges (smaller moduli sets) [1,4]. This conversion is based on the Chinese Remainder Theorem (CRT) that states

[1]  Atsushi Shimbo,et al.  Cox-Rower Architecture for Fast Parallel Montgomery Multiplication , 2000, EUROCRYPT.

[2]  Leonel Sousa,et al.  Elliptic Curve point multiplication on GPUs , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[3]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[4]  Nicolas Guillermin A High Speed Coprocessor for Elliptic Curve Scalar Multiplications over \mathbbFp\mathbb{F}_p , 2010, CHES.

[5]  Leonel Sousa,et al.  RNS-Based Elliptic Curve Point Multiplication for Massive Parallel Architectures , 2012, Comput. J..

[6]  Jean-Claude Bajard,et al.  Modular multiplication and base extensions in residue number systems , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[7]  Ramdas Kumaresan,et al.  Fast Base Extension Using a Redundant Modulus in RNS , 1989, IEEE Trans. Computers.