SIMD re-convergence at thread frontiers
暂无分享,去创建一个
Sudhakar Yalamanchili | Andrew Kerr | Haicheng Wu | Gregory Frederick Diamos | Subramaniam Maiyuran | Benjamin Ashbaugh | G. Diamos | S. Yalamanchili | Andrew Kerr | Haicheng Wu | S. Maiyuran | Ben Ashbaugh
[1] Ahmed Sameh,et al. The Illiac IV system , 1972 .
[2] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[3] Adam Levinthal,et al. Chap - a SIMD graphics processor , 1984, SIGGRAPH.
[4] Michael T. Goodrich,et al. A bridging model for parallel computation, communication, and I/O , 1996, CSUR.
[5] Jeffrey S. Vetter,et al. Quantifying NUMA and contention effects in multi-GPU systems , 2011, GPGPU-4.
[6] Tor M. Aamodt,et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[7] Aldo Badano,et al. Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit. , 2009, Medical physics.
[8] Erik H. D'Hollander,et al. Using hammock graphs to structure programs , 2004, IEEE Transactions on Software Engineering.
[9] Takao Hatazaki. Tsubame-2 - a 2.4 PFLOPS peak performance system , 2011, 2011 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference.
[10] Kevin Skadron,et al. Dynamic warp subdivision for integrated branch and memory divergence tolerance , 2010, ISCA.
[11] Andreas Dietrich,et al. OptiX: a general purpose ray tracing engine , 2010, SIGGRAPH 2010.
[12] Tor M. Aamodt,et al. Thread block compaction for efficient SIMT control flow , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[13] Hoai Bac Le,et al. GPU Implementation of Extended Gaussian Mixture Model for Background Subtraction , 2010, 2010 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF).
[14] Cole Trapnell,et al. Optimizing data intensive GPGPU computations for DNA sequence alignment , 2009, Parallel Comput..
[15] Ugo Erra,et al. Real-Time Adaptive GPU Multiagent Path Planning , 2012 .
[16] David A Boas,et al. Monte Carlo simulation of photon migration in 3D turbid media accelerated by graphics processing units. , 2009, Optics express.
[17] Sudhakar Yalamanchili,et al. A characterization and analysis of PTX kernels , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[18] Sudhakar Yalamanchili,et al. Characterization and Transformation of Unstructured Control Flow in GPU Applications , 2011 .
[19] David K. McAllister,et al. OptiX: a general purpose ray tracing engine , 2010, ACM Trans. Graph..
[20] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[21] Sudhakar Yalamanchili,et al. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).