Systolic Modular Multiplication

A systolic array for modular multiplication is presented using the ideally suited algorithm of P.L. Montgomery (1985). Throughput is one modular multiplication every clock cycle, with a latency of 2n+2 cycles for multiplicands having n digits. Its main use would be where many consecutive multiplications are done, as in RSA cryptosystems. >