论文信息 - Cache oblivious storage and access heuristics for blocked matrix-matrix multiplication

Cache oblivious storage and access heuristics for blocked matrix-matrix multiplication

The authors investigate effects of ordering in blocked matrix-matrix multiplication. They find that submatrices do not have to be stored contiguously in memory in order to achieve near optimal performance. They also find a good choice of execution order of submatrix operations can lead to a speedup of up to four times for small block sizes.

[1] R. C. Whaley,et al. Automatically Tuned Linear Algebra Software (ATLAS) , 2011, Encyclopedia of Parallel Computing.

[2] Jack Dongarra,et al. Computational Science - ICCS 2002, Proceedings Part III , 2002 .

[3] Guohua Jin,et al. Using Space-filling Curves for Computation Reordering , 2005 .

[4] Journal of Chemical Physics , 1932, Nature.

[5] Audra E. Kosh,et al. Linear Algebra and its Applications , 1992 .

[6] Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, PPSC 1999, San Antonio, Texas, USA, March 22-24, 1999 , 1999, PPSC.

[7] H. P. Dikshit,et al. ADVANCES IN COMPUTATIONAL MATHEMATICS: NEW DELHI, INDIA: Proceedings of the Conference , 1994 .

[8] R. J. Joenk,et al. IBM journal of research and development: information for authors , 1978 .