Impact of Warp Formation on GPU Performance
暂无分享,去创建一个
[1] Michael J. Flynn,et al. Very high-speed computing systems , 1966 .
[2] Erik Lindholm,et al. A user-programmable vertex engine , 2001, SIGGRAPH.
[3] Norman P. Jouppi,et al. Available instruction-level parallelism for superscalar and superpipelined machines , 1989, ASPLOS III.
[4] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[5] Adam Levinthal,et al. Chap - a SIMD graphics processor , 1984, SIGGRAPH.
[6] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[7] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..
[8] Tor M. Aamodt,et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[9] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[10] Andrew Brownsword,et al. Hardware transactional memory for GPU architectures , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[11] Kevin Skadron,et al. A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..
[12] Quinn Jacobson,et al. A study of control independence in superscalar processors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[13] Jukka Helin,et al. Performance analysis of the CM-2, a massively parallel SIMD computer , 1992, ICS '92.