Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures
暂无分享,去创建一个
David R. Kaeli | Dana Schaa | Perhaad Mistry | Byunghyun Jang | D. Kaeli | Dana Schaa | Perhaad Mistry | B. Jang
[1] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[2] Jens H. Krüger,et al. GPGPU: general purpose computation on graphics hardware , 2004, SIGGRAPH '04.
[3] Jack J. Dongarra,et al. A comparative study of automatic vectorizing compilers , 1991, Parallel Comput..
[4] David R. Kaeli,et al. Data transformations enabling loop vectorization on multithreaded data parallel architectures , 2010, PPoPP '10.
[5] Pat Hanrahan,et al. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.
[6] Wonyong Sung,et al. Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware , 2008, CASES '08.
[7] John Zahorjan,et al. Optimizing Data Locality by Array Restructuring , 1995 .
[8] Sharad Malik,et al. Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.
[9] John D. Owens,et al. GPU Computing , 2008, Proceedings of the IEEE.
[10] David R. Kaeli,et al. Architecture-aware optimization targeting multithreaded stream computing , 2009, GPGPU-2.
[11] Anjul Patney,et al. Efficient computation of sum-products on GPUs through software-managed cache , 2008, ICS '08.
[12] David R. Kaeli,et al. Multi GPU implementation of iterative tomographic reconstruction algorithms , 2009, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.
[13] Wonyong Sung,et al. Access-Pattern-Aware On-Chip Memory Allocation for SIMD Processors , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[14] Uday Bondhugula,et al. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories , 2008, PPoPP.
[15] N.K. Govindaraju,et al. A Memory Model for Scientific Algorithms on Graphics Processors , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[16] Ronald Fedkiw,et al. Robust quasistatic finite elements and flesh simulation , 2005, SCA '05.
[17] Corinna G. Lee,et al. Simple vector microprocessors for multimedia applications , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[18] Robert A. van de Geijn,et al. BLAS (Basic Linear Algebra Subprograms) , 2011, Encyclopedia of Parallel Computing.
[19] nVIDIA社. CUDA Programming Guide 1.1 , 2007 .