Accelerating the Explicitly Restarted Arnoldi Method with GPUs Using an Autotuned Matrix Vector Product
暂无分享,去创建一个
This paper presents a parallelized hybrid single-vector Arnoldi algorithm for computing approximations to eigenpairs of a nonsymmetric matrix. We are interested in the use of accelerators and multicore units to speed up the Arnoldi process. The main goal is to propose a parallel version of the Arnoldi solver, which can efficiently use multiple multicore processors or multiple graphics processing units (GPUs) in a mixed coarse and fine grain fashion. In the proposed algorithms, this is achieved by an autotuning of the matrix vector product before starting the Arnoldi eigensolver as well as the reorganization of the data and global communications so that communication time is reduced. The execution time, performance, and scalability are assessed with well-known dense and sparse test matrices on multiple Nehalems, GT200 NVidia Tesla, and next generation Fermi Tesla. With one processor, we see a performance speedup of 2 to 3x when using all the physical cores, and a total speedup of 2 to 8x when adding a GPU to this multicore unit, and hence a speedup of 4 to 24x compared to the sequential solver.