Putting Fürer Algorithm into Practice with the BPAS Library

Fast algorithms for integer and polynomial multiplication play an important role in scientific computing as well as in other disciplines. In 1971, Sch{\"o}nhage and Strassen designed an algorithm that improved the multiplication time for two integers of at most $n$ bits to $\mathcal{O}(\log n \log \log n)$. In 2007, Martin F\"urer presented a new algorithm that runs in $O \left(n \log n\ \cdot 2^{O(\log^* n)} \right)$, where $\log^* n$ is the iterated logarithm of $n$. We explain how we can put F\"urer's ideas into practice for multiplying polynomials over a prime field $\mathbb{Z} / p \mathbb{Z}$, for which $p$ is a Generalized Fermat prime of the form $p = r^k + 1$ where $k$ is a power of $2$ and $r$ is of machine word size. When $k$ is at least 8, we show that multiplication inside such a prime field can be efficiently implemented via Fast Fourier Transform (FFT). Taking advantage of Cooley-Tukey tensor formula and the fact that $r$ is a $2k$-th primitive root of unity in $\mathbb{Z} / p \mathbb{Z}$, we obtain an efficient implementation of FFT over $\mathbb{Z} / p \mathbb{Z}$. This implementation outperforms comparable implementations either using other encodings of $\mathbb{Z} / p \mathbb{Z}$ or other ways to perform multiplication in $\mathbb{Z} / p \mathbb{Z}$.

[1]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[2]  Marc Moreno Maza,et al.  Big Prime Field FFT on the GPU , 2017, ISSAC.

[3]  Alfred Menezes,et al.  Guide to Elliptic Curve Cryptography , 2004, Springer Professional Computing.

[4]  Anatolij A. Karatsuba,et al.  Multiplication of Multidigit Numbers on Automata , 1963 .

[5]  Marc Moreno Maza,et al.  Lifting techniques for triangular decompositions , 2005, ISSAC.

[6]  V. Strassen Gaussian elimination is not optimal , 1969 .

[7]  Franz Franchetti,et al.  FFT (Fast Fourier Transform) , 2011, Encyclopedia of Parallel Computing.

[8]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[9]  Elizabeth A. Arnold,et al.  Modular algorithms for computing Gröbner bases , 2003, J. Symb. Comput..

[10]  Anindya De,et al.  Fast integer multiplication using modular arithmetic , 2008, STOC.

[11]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[12]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[13]  Marc Moreno Maza,et al.  Implementation techniques for fast polynomial arithmetic in a high-level programming environment , 2006, ISSAC '06.

[14]  Marc Moreno Maza,et al.  The modpn library: bringing fast polynomial arithmetic into MAPLE , 2009, ACCA.

[15]  R. Gregory Taylor,et al.  Modern computer algebra , 2002, SIGA.

[16]  Changbo Chen,et al.  An Application of Regular Chain Theory to the Study of Limit cycles , 2013, Int. J. Bifurc. Chaos.

[17]  Wei Pan,et al.  Algorithmic Contributions to the Theory of Regular Chains , 2011 .

[18]  Martin Fürer Faster integer multiplication , 2007, STOC '07.

[19]  Jean-Guillaume Dumas,et al.  Finite field linear algebra subroutines , 2002, ISSAC '02.

[20]  Arnold Schönhage,et al.  Schnelle Multiplikation großer Zahlen , 1971, Computing.

[21]  Tommy Färnqvist Number Theory Meets Cache Locality – Efficient Implementation of a Small Prime FFT for the GNU Multiple Precision Arithmetic Library , 2005 .

[22]  Victor Shoup,et al.  A New Polynomial Factorization Algorithm and its Implementation , 1995, J. Symb. Comput..

[23]  Changbo Chen,et al.  The basic polynomial algebra subprograms , 2016, ACCA.