A ScaLAPACK-Style Algorithm for Reducing a Regular Matrix Pair to Block Hessenberg-Triangular Form

A parallel algorithm for reduction of a regular matrix pair (A, B) to block Hessenberg-triangular form is presented. It is shown how a sequential elementwise algorithm can be reorganized in terms of blocked factorizations and matrix-matrix operations. Moreover, this LAPACK-style algorithm is straightforwardly extended to a parallel algorithm for a rectangular 2D processor grid using parallel kernels from ScaLAPACK. A hierarchical performance model is derived and used for algorithm analysis and selection of optimal blocking parameters and grid sizes.