Patterns of Inefficient Performance Behavior in GPU Applications
暂无分享,去创建一个
[1] Allen D. Malony,et al. An experimental approach to performance measurement of heterogeneous parallel applications using CUDA , 2010, ICS '10.
[2] Kevin Skadron,et al. A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..
[3] Matthias S. Müller,et al. The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.
[4] Wen-mei W. Hwu,et al. MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs , 2008, LCPC.
[5] Jeffrey K. Hollingsworth,et al. Grindstone: A Test Suite for Parallel Performance Tools , 1998 .
[6] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..
[7] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).
[8] Guido Juckeland,et al. High Resolution Program Flow Visualization of Hardware Accelerated Hybrid Multi-core Applications , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.
[9] Bernd Mohr,et al. A test suite for parallel performance analysis tools , 2007, Concurr. Comput. Pract. Exp..
[10] Jason Cong,et al. High-performance CUDA kernel execution on FPGAs , 2009, ICS.
[11] Matt Pharr,et al. Gpu gems 2: programming techniques for high-performance graphics and general-purpose computation , 2005 .
[12] Michael Boyer. Automated Dynamic Analysis of CUDA Programs , 2008 .
[13] Zeljko Hocenski,et al. Parallel Processing with CUDA in Ceramic Tiles Classification , 2010, KES.