Autotuning GPU Kernels via Static and Predictive Analysis
暂无分享,去创建一个
[1] Mark Stephenson,et al. Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.
[2] Eric Petit,et al. CERE: LLVM-Based Codelet Extractor and REplayer for Piecewise Benchmarking and Optimization , 2015, TACO.
[3] François Bodin,et al. A Machine Learning Approach to Automatic Production of Compiler Heuristics , 2002, AIMSA.
[4] William Gropp,et al. Annotations for Productivity and Performance Portability , 2007 .
[5] Michael F. P. O'Boyle,et al. Milepost GCC: Machine Learning Enabled Self-tuning Compiler , 2011, International Journal of Parallel Programming.
[6] William Jalby,et al. Is Source-Code Isolation Viable for Performance Characterization? , 2013, 2013 42nd International Conference on Parallel Processing.
[7] Allen D. Malony,et al. Identifying Optimization Opportunities Within Kernel Execution in GPU Codes , 2015, Euro-Par Workshops.
[8] Satoshi Matsuoka,et al. Auto-tuning 3-D FFT library for CUDA GPUs , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[9] Isaac D. Scherson,et al. Computationally Efficient Multiplexing of Events on Hardware Counters , 2014 .
[10] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..
[11] Mary W. Hall,et al. CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .
[12] Dong Li,et al. Application Characterization Using Oxbow Toolkit and PADS Infrastructure , 2014, 2014 Hardware-Software Co-Design for High Performance Computing.
[13] Allen D. Malony,et al. Toward multi-target autotuning for accelerators , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).
[14] Jack J. Dongarra,et al. Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[15] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[16] Sunita Chandrasekaran,et al. An Analytical Model-Based Auto-tuning Framework for Locality-Aware Loop Scheduling , 2016, ISC.
[17] P. Sadayappan,et al. Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[18] Allen D. Malony,et al. Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs , 2011, 2011 International Conference on Parallel Processing.
[19] Boyana Norris,et al. Autotuning Stencil-Based Computations on GPUs , 2012, 2012 IEEE International Conference on Cluster Computing.