Adaptive GPU Array Layout Auto-Tuning
暂无分享,去创建一个
[1] Amnon Barak,et al. Memory access patterns: the missing piece of the multi-GPU puzzle , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Kenneth E. Batcher,et al. Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.
[3] Michael Goesele,et al. Guided profiling for auto-tuning array layouts on GPUs , 2015, PMBS '15.
[4] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[5] Scott A. Mahlke,et al. ELF: maximizing memory-level parallelism for GPUs with coordinated warp and fetch scheduling , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[7] Michael Garland,et al. Nitro: A Framework for Adaptive Code Variant Tuning , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[8] Thomas Fahringer,et al. Automatic Data Layout Optimizations for GPUs , 2015, Euro-Par.
[9] Xipeng Shen,et al. A cross-input adaptive framework for GPU program optimizations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[10] Hans Henrik Brandenborg Sørensen,et al. Auto-tuning Dense Vector and Matrix-Vector Operations for Fermi GPUs , 2011, PPAM.
[11] I-Hsin Chung,et al. Using Information from Prior Runs to Improve Automated Tuning Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[12] David D. Cox,et al. Machine learning for predictive auto-tuning with boosted regression trees , 2012, 2012 Innovative Parallel Computing (InPar).
[13] Michael Goesele,et al. Information-theoretic analysis of molecular (co)evolution using graphics processing units , 2012, ECMLS '12.
[14] Henk Corporaal,et al. Adaptive and transparent cache bypassing for GPUs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[15] Frank Mueller,et al. Auto-generation and auto-tuning of 3D stencil codes on GPU clusters , 2012, CGO '12.
[16] Robert L. Cook,et al. The Reyes image rendering architecture , 1987, SIGGRAPH.
[17] Xipeng Shen,et al. An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations , 2012, International Journal of Parallel Programming.