论文信息 - Efficient Reduction from Block Hessenberg Form to Hessenberg Form Using Shared Memory

Efficient Reduction from Block Hessenberg Form to Hessenberg Form Using Shared Memory

A new cache-efficient algorithm for reduction from block Hessenberg form to Hessenberg form is presented and evaluated. The algorithm targets parallel computers with shared memory. One level of look-ahead in combination with a dynamic load-balancing scheme significantly reduces the idle time and allows the use of coarse-grained tasks. The coarse tasks lead to high-performance computations on each processor/core. Speedups close to 13 over the sequential unblocked algorithm have been observed on a dual quad-core machine using one thread per core.

Lars Karlsson | Bo Kågström

[1] H. Rutishauser. On jacobi rotation patterns , 1963 .

[2] K. Murata,et al. A New Method for the Tridiagonalization of the Symmetric Band Matrix , 1975 .

[3] B. Kågström,et al. Blocked algorithms for the reduction to Hessenberg-triangular form revisited , 2008 .

[4] Sraban Kumar Mohanty. I/O Efficient Algorithms for Matrix Computations , 2010, ArXiv.

[5] Christian H. Bischof,et al. A framework for symmetric band reduction , 2000, TOMS.

[6] Jack Dongarra,et al. Scheduling two-sided transformations using tile algorithms on multicore architectures , 2010 .

[7] Jack Dongarra,et al. Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing , 2009 .

[8] Krister Dackland,et al. Blocked algorithms and software for reduction of a regular matrix pair to generalized Schur form , 1999, TOMS.

[9] Krister Dackland,et al. Parallel Two-Stage Reduction of a Regular Matrix Pair to Hessenberg-Triangular Form , 2000, PARA.