Optimization space pruning without regrets
暂无分享,去创建一个
[1] Markus Püschel,et al. Bandit-based optimization on graphs with application to library performance tuning , 2009, ICML '09.
[2] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[3] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[4] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[5] Uday Bondhugula,et al. Loop transformations: convexity, pruning and optimization , 2011, POPL '11.
[6] Markus Püschel,et al. A Basic Linear Algebra Compiler , 2014, CGO '14.
[7] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[8] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[9] Michael F. P. O'Boyle,et al. Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[10] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[11] André Seznec,et al. Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[12] Jack J. Dongarra,et al. Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..
[13] Dinakar Dhurjati,et al. Scaling up Superoptimization , 2016, ASPLOS.
[14] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[15] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[16] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[17] Margaret Martonosi,et al. Starchart: Hardware and software optimization using recursive partitioning regression trees , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[18] Scott A. Mahlke,et al. Adaptive input-aware compilation for graphics engines , 2012, PLDI '12.
[19] J. Demmel,et al. Sun Microsystems , 1996 .
[20] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[21] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[22] Michel Steuwer,et al. Performance portable GPU code generation for matrix multiplication , 2016, GPGPU@PPoPP.
[23] Jack J. Dongarra,et al. A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.