Long Modular Multiplication for Cryptographic Applications

A digit-serial, multiplier-accumulator based cryptographic co-processor architecture is proposed, similar to fix-point DSP’s with enhancements, supporting long modular arithmetic and general computations. Several new “column-sum” variants of popular quadratic time modular multiplication algorithms are presented (Montgomery and interleaved division-reduction with or without Quisquater scaling), which are faster than the traditional implementations, need no or very little memory beyond the operand storage and perform squaring about twice faster than general multiplications or modular reductions. They provide similar advantages in software for general purpose CPU’s.