Towards Metaprogramming for Parallel Systems on a Chip
暂无分享,去创建一个
[1] Aart Johannes Casimir Bik. The software vectorization handbook , 2004 .
[2] Paul H. J. Kelly,et al. Deriving Efficient Data Movement from Decoupled Access/Execute Specifications , 2008, HiPEAC.
[3] Lawrence Snyder,et al. Principles of Parallel Programming , 2008 .
[4] William J. Dally,et al. Sequoia: Programming the Memory Hierarchy , 2006, International Conference on Software Composition.
[5] Aart J. C. Bik. The Software Vectorization Handbook: Apply-ing Multimedia Extensions for Maximum Performance , 2004 .
[6] Wen-mei W. Hwu,et al. Program optimization carving for GPU computing , 2008, J. Parallel Distributed Comput..
[7] Wen-mei W. Hwu,et al. CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.
[8] Aart J. C. Bik. Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance , 2004 .