A Performance Model of Dense Matrix Operations on Many-Core Architectures
暂无分享,去创建一个
Dongrui Fan | Wei Lin | Guoping Long | Junchao Zhang | Fenglong Song | Nan Yuan | Guoping Long | Junchao Zhang | Dongrui Fan | Fenglong Song | Nan Yuan | Wei Lin
[1] Saurabh Dighe,et al. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[2] Guang R. Gao,et al. Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures , 2007, ISCA '07.
[3] Geppino Pucci,et al. The Potential of On-Chip Multiprocessing for QCD Machines , 2005, HiPC.
[4] Guang R. Gao,et al. Experience on optimizing irregular computation for memory hierarchy in manycore architecture , 2008, ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming.
[5] Keshav Pingali,et al. An experimental comparison of cache-oblivious and cache-conscious programs , 2007, SPAA '07.
[6] Ioannis E. Venetis,et al. Optimizing the LU Benchmark for the Cyclops-64 Architecture , 2009 .
[7] Jung Ho Ahn,et al. Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[8] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[9] Guang R. Gao,et al. Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences , 2006, Euro-Par.
[10] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .