The one-sided block Jacobi (OSBJ) method is known to be an efficient algorithm for computing the singular value decomposition. In this paper, we evaluate the performance of the most recent variant of the OSBJ method, the one with dynamic ordering and variable blocking, on the Fujitsu FX10 parallel computer. By analyzing the performance results, we identified two bottlenecks, namely, weight computation for ordering and diagonalization of \(2\times 2\) block matrices. To resolve the problem, we propose new implementations for these two tasks. Experimental results show that they are effective and can achieve speedup of up to 1.6 times in total. As a result, our OSBJ solver can compute the SVD of matrices of order 2048 to 8192 on 12 to 48 nodes of FX10 more than three times faster than ScaLAPACK PDGESVD.
[1]
Gabriel Oksa,et al.
Dynamic ordering for a parallel block-Jacobi SVD algorithm
,
2002,
Parallel Comput..
[2]
M. Becka,et al.
Parallel Code for One-sided Jacobi-Method
,
2015
.
[3]
Gabriel Oksa,et al.
Efficient pre-processing in the parallel block-Jacobi SVD algorithm
,
2006,
Parallel Comput..
[4]
Zlatko Drmac,et al.
New Fast and Accurate Jacobi SVD Algorithm. I
,
2007,
SIAM J. Matrix Anal. Appl..
[5]
Gabriel Oksa,et al.
Parallel One-Sided Jacobi SVD Algorithm with Variable Blocking Factor
,
2013,
PPAM.