Efficient GCD Computation for Big Integers on Xeon Phi Coprocessor

Efficient calculation of the greatest common divisor (GCD) for big integers each whose number of bits is greater than or equal to 1024 has drawn a considerable amount of attention because it can be used to detect a weakness of the RSA security infrastructure. This paper presents a parallel binary GCD algorithm and its implementation for big integers on the Intel Xeon Phi coprocessor. This algorithm is capable of computing GCDs efficiently on many pairs of big integers in parallel by utilizing all cores on a Xeon Phi coprocessor as well as taking advantage of all vector processing units of the coprocessor to speed up critical integer operations within the algorithm. Using 240 threads on a Xeon Phi coprocessor to carry out GCD calculations for a large amount of 2048-bit integers, the implementation achieves the speedup of 30 times over a sequential binary GCD algorithm implementation on a single CPU core, and it delivers twice amount of performance in comparison to the same sequential binary GCD implementation running on 240 threads of the Xeon Phi.

[1]  Joseph White,et al.  Breaking Weak 1024-bit RSA Keys with CUDA , 2012, 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies.

[2]  Arjen K. Lenstra,et al.  Ron was wrong, Whit is right , 2012, IACR Cryptol. ePrint Arch..

[3]  Victor W. Lee,et al.  Lattice QCD on Intel Xeon Phi , 2013 .

[4]  M. Lehman,et al.  Skip Techniques for High-Speed Carry-Propagation in Binary Arithmetic Units , 1961, IRE Trans. Electron. Comput..

[5]  J. Stein Computational problems associated with Racah algebra , 1967 .

[6]  Brigitte Vallée,et al.  The Complete Analysis of the Binary Euclidean Algorithm , 1998, ANTS.

[7]  Noriyuki Fujimoto High throughput multiple-precision GCD on the CUDA architecture , 2009, 2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).