论文信息 - QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism

QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism

Quadtree matrices using Morton-order storage provide natural blocking on every level of a memory hierarchy. Writing the natural recursive algorithms to take advantage of this blocking results in code that honors the memory hierarchy without the need for transforming the code. Furthermore, the divide-and-conquer algorithm breaks problems down into independent computations. These independent computations can be dispatched in parallel for straightforward parallel processing.Proof-of-concept is given by an algorithm for QR factorization based on Givens rotations for quadtree matrices in Morton-order storage. The algorithms deliver positive results, competing with and even beating the LAPACK equivalent.

Jeremy D. Frens | David S. Wise

[1] Chris J. Scheiman,et al. LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[2] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .

[3] J. Davenport. Editor , 1960 .

[4] Donald Ervin Knuth,et al. The Art of Computer Programming , 1968 .

[5] Mithuna Thottethodi,et al. Recursive Array Layouts and Fast Matrix Multiplication , 2002, IEEE Trans. Parallel Distributed Syst..

[6] Mithuna Thottethodi,et al. Recursive array layouts and fast parallel matrix multiplication , 1999, SPAA '99.

[7] Martin C. Rinard,et al. Recursion Unrolling for Divide and Conquer Programs , 2000, LCPC.

[8] Jeremy D. Frens,et al. Matrix factorization using a block-recursive structure and block-recursive algorithms , 2002 .

[9] Jack Dongarra,et al. LAPACK: a portable linear algebra library for high-performance computers , 1990, SC.

[10] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.

[11] Gene H. Golub,et al. Matrix computations , 1983 .