Fast scalable radix-4 Montgomery modular multiplier

Montgomery modular multiplication is widely applied to public key cryptosystems like Rivest-Sharmir-Adleman (RSA) and elliptic curve cryptography (ECC). This work presents a word-based Booth encoded radix-4 Montgomery modular multiplication algorithm for low-latency scalable architecture. The data dependency resulting from the inherent right shifting of the intermediate results in the conventional radix-4 Montgomery modular multiplication algorithm is alleviated; thus the latency between the neighboring process elements (PEs) is exactly one cycle. The number of the equivalent operands in the accumulation is not increased with operand reduction scheme. Implementation results based on the same technology show that compared to other Booth encoded radix-4 Montgomery modular multipliers, the proposed design achieves at least 23% time reduction for accomplishing one 1024-bit Montgomery modular multiplication.