High speed radix-16 design of a scalable Montgomery multiplier

This paper describes an improved version of the Tenca-Todorov-Koc word based radix-8 Montgomery multiplier. It uses radix-16 for fast without adding any hardware, and adjusting the data-path to get shorter critical path, and requires half of FIFO memory. This design is reconfigurable to accept any input precision as the Tenca-Todorov-Koc's design. An ASIC implementation in 0.25 mum CMOS standard cell technology can perform 2048-bit modular exponentiation in 28ms under 125MHz clock period