Optimizing matrix transposes using a POWER7 cache model and explicit prefetching
暂无分享,去创建一个
[1] David A. Patterson,et al. Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .
[2] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[3] Balaram Sinharoy,et al. IBM POWER7 multicore server processor , 2011 .
[4] David A. Patterson,et al. Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .
[5] John McCalpin,et al. Automatic benchmark generation for cache optimization of matrix operations , 1995, ACM-SE 33.
[6] Siddhartha Chatterjee,et al. Cache-efficient matrix transposition , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[7] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[8] Ramakrishnan Rajamony,et al. PERCS: The IBM POWER7-IH high-performance computing system , 2011, IBM J. Res. Dev..
[9] Balaram Sinharoy,et al. POWER7: IBM's next generation server processor , 2010, 2009 IEEE Hot Chips 21 Symposium (HCS).