Hardware implementation of large number multiplication by FFT with modular arithmetic

Modular multiplication (MM) for large integers is the foundation of most public-key cryptosystems, specifically RSA, El-Gamal and the elliptic curve cryptosystems. Thus MM algorithms have been studied widely and extensively. Most of works are based on the well known Montgomery multiplication method (MMM) and its variants, which require multiplication in N. Authors have always avoided the fast Fourier transform (FFT) method believing that it is impractical for present system sizes despite its smaller complexity order. In this paper, the authors presented the design and hardware implementation of a FFT-based algorithm using modular arithmetic to efficiently compute very large number multiplications. The algorithm has been implemented in CASM, an intermediate level HDL developed in the laboratory. The target architecture is a FPGA. The algorithm is scalable and can easily be mapped to any operand size. Results show that such algorithm implementation starts to be useful for 4096-bit operands and beyond.

[1]  A.M. El-Khashab,et al.  The modular pipeline fast Fourier transform algorithm and architecture , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[2]  Etienne Bergeron,et al.  An Intermediate Level HDL for System Level Design , 2004, FDL.

[3]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[4]  J. McCanny,et al.  Modified Montgomery modular multiplication and RSA exponentiation techniques , 2004 .

[5]  Daniel N. Rockmore,et al.  The FFT: an algorithm the whole family can use , 2000, Comput. Sci. Eng..

[6]  Christof Paar,et al.  High-Radix Montgomery Modular Exponentiation on Reconfigurable Hardware , 2001, IEEE Trans. Computers.

[7]  T. Bially,et al.  Parallelism in fast Fourier transform hardware , 1973 .

[8]  Eric Peeters,et al.  Parallel FPGA implementation of RSA with residue number systems - can side-channel threats be avoided? , 2003, 2003 46th Midwest Symposium on Circuits and Systems.

[9]  Amos Omondi,et al.  Arithmetic Circuits Combining Residue and Signed-Digit Representations , 2003, Asia-Pacific Computer Systems Architecture Conference.

[10]  Joos Vandewalle,et al.  Hardware architectures for public key cryptography , 2003, Integr..

[11]  Gilles Brassard,et al.  Algorithmics - theory and practice , 1988 .

[12]  Jean-Claude Bajard,et al.  Modular multiplication and base extensions in residue number systems , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[13]  Robert T. Moenck,et al.  Practical fast polynomial multiplication , 1976, SYMSAC '76.

[14]  Joos Vandewalle,et al.  Hardware implementation of a Montgomery modular multiplier in a systolic array , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[15]  John S. Thompson,et al.  A novel coefficient ordering based low power pipelined radix-4 FFT processor for wireless LAN applications , 2003, IEEE Trans. Consumer Electron..

[16]  Whitfield Diffie,et al.  New Directions in Cryptography , 1976, IEEE Trans. Inf. Theory.

[17]  Paul Walton Purdom,et al.  The Analysis of Algorithms , 1995 .