Iterative Sparse Matrix-Vector Multiplication for Integer Factorization on GPUs

The Block Wiedemann (BW) and the Block Lanczos (BL) algorithms are frequently used to solve sparse linear systems over GF(2). Iterative sparse matrix-vector multiplication is the most time consuming operation of these approaches. The necessity to accelerate this step is motivated by the application of these algorithms to very large matrices used in the linear algebra step of the Number Field Sieve (NFS) for integer factorization. In this paper we derive an efficient CUDA implementation of this operation using a newly designed hybrid sparse matrix format. This leads to speedups between 4 and 8 on a single GPU for a number of tested NFS matrices compared to an optimized multicore implementation.

[1]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[2]  S. A. Danilov,et al.  Factorization of RSA-180 , 2010, IACR Cryptol. ePrint Arch..

[3]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[4]  Thorsten Kleinjung,et al.  Using a grid platform for solving large sparse linear systems over GF(2) , 2010, 2010 11th IEEE/ACM International Conference on Grid Computing.

[5]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[6]  Richard W. Vuduc,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.

[7]  Dongseung Kim,et al.  Load Balanced Block Lanczos Algorithm over GF(2) for Factorization of Large Keys , 2006, HiPC.

[8]  Arutyun Avetisyan,et al.  Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures , 2010, HiPEAC.

[9]  Arjen K. Lenstra,et al.  A Kilobit Special Number Field Sieve Factorization , 2007, ASIACRYPT.

[10]  Arjen K. Lenstra,et al.  Factorization of a 768-Bit RSA Modulus , 2010, CRYPTO.

[11]  Arjen K. Lenstra,et al.  A heterogeneous computing environment to solve the 768-bit RSA challenge , 2010, Cluster Computing.

[12]  D. Coppersmith Solving homogeneous linear equations over GF (2) via block Wiedemann algorithm , 1994 .

[13]  Kazumaro Aoki,et al.  Experiments on the Linear Algebra Step in the Number Field Sieve , 2007, IWSEC.

[14]  Peter L. Montgomery,et al.  A Block Lanczos Algorithm for Finding Dependencies Over GF(2) , 1995, EUROCRYPT.

[15]  Jean-Guillaume Dumas,et al.  Exact sparse matrix-vector multiplication on GPU's and multicore architectures , 2010, PASCO.

[16]  Masakatsu Nishigaki,et al.  Advances in Information and Computer Security - 6th International Workshop, IWSEC 2011, Tokyo, Japan, November 8-10, 2011. Proceedings , 2011, IWSEC.