Automatic Data Layout Transformation for Heterogeneous Many-Core Systems
暂无分享,去创建一个
[1] Yao Zhang,et al. Parallel Computing Experiences with CUDA , 2008, IEEE Micro.
[2] Lars Karlsson,et al. Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion , 2012, TOMS.
[3] David R. Kaeli,et al. Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.
[4] Wen-mei W. Hwu,et al. DL: A data layout transformation system for heterogeneous computing , 2012, 2012 Innovative Parallel Computing (InPar).
[5] Kevin Skadron,et al. Dymaxion: Optimizing memory access patterns for heterogeneous systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[6] Lars Karlsson,et al. Blocked in-place transposition with application to storage format conversion , 2009 .
[7] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.