Applying OOC Techniques in the Reduction to Condensed Form for Very Large Symmetric Eigenproblems on GPUs

In this paper we address the reduction of a dense matrix to tridiagonal form for the solution of symmetric eigen value problems on a graphics processor (GPU) when the data is too large to fit into the accelerator memory. We apply out of-core techniques to a three-stage algorithm, carefully redesigning the first stage to reduce the number of data transfers between the CPU and GPU memory spaces, maintain the memory requirements on the GPU within limits, and ensure high performance by featuring a high ratio between computation and communication.