Memory-level and Thread-level Parallelism Aware GPU Architecture Performance Analytical Model
暂无分享,去创建一个
[1] Stéphan Jourdan,et al. Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[2] Dawid Pajak. General-Purpose Computation Using Graphics Hardware for Fast HDR Image Processing , 2007 .
[3] Edward T. Grochowski,et al. Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[4] David R. Kaeli,et al. Exploring the multiple-GPU design space , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[5] Kevin Skadron,et al. Increasing memory miss tolerance for SIMD cores , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[6] Xiuwen Liu,et al. Face detection using spectral histograms and SVMs , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[7] Weiguo Liu,et al. Performance Predictions for General-Purpose Computation on GPUs , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).
[8] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[9] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[10] David E. Culler,et al. An Analytical Solution for a Markov Chain Modeling Multithreaded Execution , 1991 .
[11] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[12] Randi J. Rost. OpenGL shading language , 2004 .
[13] John Paul Shen,et al. Theoretical modeling of superscalar processor performance , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.
[14] K. Srinathan,et al. A performance prediction model for the CUDA GPGPU platform , 2009, 2009 International Conference on High Performance Computing (HiPC).
[15] Teresa H. Y. Meng,et al. Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.
[16] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[17] N.K. Govindaraju,et al. A Memory Model for Scientific Algorithms on Graphics Processors , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[18] Tor M. Aamodt,et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[19] Jim X. Chen,et al. OpenGL Shading Language , 2009 .
[20] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[21] Wen-mei W. Hwu,et al. Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.
[22] Mary K. Vernon,et al. Analytic evaluation of shared-memory systems with ILP processors , 1998, ISCA.
[23] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[24] J. Xu. OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .
[25] Kevin Skadron,et al. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs , 2009, ICS.
[26] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[27] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[28] Pierre Michaud,et al. Data-flow prescheduling for large instruction windows in out-of-order processors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[29] Tor M. Aamodt,et al. A first-order fine-grained multithreaded throughput model , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[30] James E. Smith,et al. A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..