Automatic optimization of thread-coarsening for graphics processors
暂无分享,去创建一个
Michael F. P. O'Boyle | Christophe Dubach | Alberto Magni | M. O’Boyle | Christophe Dubach | A. Magni
[1] Yi Yang,et al. A unified optimizing compiler framework for different GPGPU architectures , 2012, TACO.
[2] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Michael F. P. O'Boyle,et al. Adaptive java optimisation using instance-based learning , 2004, ICS '04.
[4] B. Manly. Multivariate Statistical Methods : A Primer , 1986 .
[5] Scott A. Mahlke,et al. Sponge: portable stream programming on graphics engines , 2011, ASPLOS XVI.
[6] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[7] Sudhakar Yalamanchili,et al. Modeling GPU-CPU workloads and systems , 2010, GPGPU-3.
[8] Xipeng Shen,et al. A cross-input adaptive framework for GPU program optimizations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[9] Michael F. P. O'Boyle,et al. A large-scale cross-architecture evaluation of thread-coarsening , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[10] Fernando Magno Quintão Pereira,et al. Divergence Analysis and Optimizations , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[11] Michael F. P. O'Boyle,et al. Exploiting GPU Hardware Saturation for Fast Compiler Optimization , 2014, GPGPU@ASPLOS.
[12] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[13] Richard W. Vuduc,et al. A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.
[14] Yi Yang,et al. Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement , 2013, ICS '13.
[15] P. Sadayappan,et al. Using machine learning to improve automatic vectorization , 2012, TACO.
[16] Simon Moll. Decompilation of LLVM IR , 2012 .
[17] David F. Bacon,et al. Compiling a high-level language for GPUs: (via language support for architectures and compilers) , 2012, PLDI.
[18] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[19] Mark Stephenson,et al. Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.
[20] Sebastian Hack,et al. Improving Performance of OpenCL on CPUs , 2012, CC.
[21] Margaret Martonosi,et al. Starchart: Hardware and software optimization using recursive partitioning regression trees , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[22] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[23] Sally A. McKee,et al. Methods of inference and learning for performance modeling of parallel applications , 2007, PPoPP.
[24] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.
[25] José Nelson Amaral,et al. Using machines to learn method-specific compilation strategies , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[26] Michael F. P. O'Boyle,et al. Fast compiler optimisation evaluation using code-feature based performance prediction , 2007, CF '07.
[27] Michael F. P. O'Boyle,et al. Partitioning streaming parallelism for multi-cores: A machine learning based approach , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[28] Naga K. Govindaraju,et al. Auto-tuning of fast fourier transform on graphics processors , 2011, PPoPP '11.
[29] Bryan F. J. Manly,et al. Multivariate Statistical Methods: A Primer, Third Edition , 1994 .
[30] References , 1971 .
[31] Krste Asanovic,et al. Convergence and scalarization for data-parallel architectures , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[32] Apan Qasem,et al. Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality , 2012, CC.