Architecture-Aware Mapping and Optimization on a 1600-Core GPU
暂无分享,去创建一个
[1] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[2] Bixia Zheng,et al. Twin Peaks: A Software Platform for Heterogeneous Computing on General-Purpose and Graphics Processors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[3] David R. Kaeli,et al. Architecture-aware optimization targeting multithreaded stream computing , 2009, GPGPU-2.
[4] Wang Gui-bin,et al. Optimizing stencil application on multi-thread GPU architecture using stream programming model , 2010, ARCS 2010.
[5] Wen-mei W. Hwu,et al. Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.
[6] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[7] Sean Rul,et al. An experimental study on performance portability of OpenCL kernels , 2010, HiPC 2010.
[8] Wen-mei W. Hwu,et al. Program optimization carving for GPU computing , 2008, J. Parallel Distributed Comput..
[9] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..
[10] David Kirk,et al. NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.
[11] Wu-chun Feng,et al. Multi-dimensional characterization of temporal data mining on graphics processors , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[12] Philippas Tsigas,et al. On dynamic load balancing on graphics processors , 2008, GH '08.
[13] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[14] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[15] John D. Owens,et al. GPU Computing , 2008, Proceedings of the IEEE.
[16] Wu-chun Feng,et al. Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units. , 2010, Journal of molecular graphics & modelling.
[17] Sam S. Stone,et al. Program Optimization Study on a 128-Core GPU , 2011 .
[18] Andrew T. Fenley,et al. An analytical approach to computing biomolecular electrostatic potential. II. Validation and applications. , 2008, The Journal of chemical physics.
[19] Samuel Williams,et al. Auto-tuning performance on multicore computers , 2008 .