A Benchmark Set of Highly-efficient CUDA and OpenCL Kernels and its Dynamic Autotuning with Kernel Tuning Toolkit
暂无分享,去创建一个
Siegfried Benkner | Jiri Filipovic | Jana Hozzová | David Strelák | Filip Petrovic | Jaroslav Olha | Richard Trembecký
[1] José María Carazo,et al. A fast iterative convolution weighting approach for gridding-based direct Fourier three-dimensional reconstruction with correction for the contrast transfer function. , 2015, Ultramicroscopy.
[2] Christoph Kessler,et al. Towards a Tunable Multi-Backend Skeleton Programming Framework for Multi-GPU Systems , 2012 .
[3] Jiri Filipovic,et al. Autotuning of OpenCL Kernels with Global Optimizations , 2017, ANDARE '17.
[4] Ben van Werkhoven,et al. Kernel Tuner: A search-optimizing GPU code auto-tuner , 2019, Future Gener. Comput. Syst..
[5] Sergei Gorlatch,et al. ATF: A Generic Auto-Tuning Framework , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[6] Michael Garland,et al. Nitro: A Framework for Adaptive Code Variant Tuning , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[7] Jack J. Dongarra,et al. A comparison of search heuristics for empirical code optimization , 2008, 2008 IEEE International Conference on Cluster Computing.
[8] Prasanna Balaprakash,et al. Autotuning in High-Performance Computing Applications , 2018, Proceedings of the IEEE.
[9] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).
[10] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[11] Ludek Matyska,et al. Optimizing CUDA code by kernel fusion: application on BLAS , 2013, The Journal of Supercomputing.
[12] Michael Gerndt,et al. Tuning OpenCL Applications with the Periscope Tuning Framework , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).
[13] D S Goodsell,et al. Automated docking of flexible ligands: Applications of autodock , 1996, Journal of molecular recognition : JMR.
[14] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[15] Jack J. Dongarra,et al. High-Performance Matrix-Matrix Multiplications of Very Small Matrices , 2016, Euro-Par.
[16] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[17] Stanislav G. Sedukhin,et al. Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[18] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[19] José María Carazo,et al. A GPU acceleration of 3-D Fourier reconstruction in cryo-EM , 2019, Int. J. High Perform. Comput. Appl..
[20] Gianluca Palermo,et al. SOCRATES — A seamless online compiler and system runtime autotuning framework for energy-aware applications , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[21] Eduardo Cesar Galobardes,et al. Automatic Tuning of HPC Applications. The Periscope Tuning Framework , 2015 .
[22] Anna Sikora,et al. AutoTune: A Plugin-Driven Approach to the Automatic Tuning of Parallel Applications , 2012, PARA.
[23] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[24] Cedric Nugteren,et al. CLTune: A Generic Auto-Tuner for OpenCL Kernels , 2015, 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip.
[25] John K. Reid,et al. The Multifrontal Solution of Indefinite Sparse Symmetric Linear , 1983, TOMS.
[26] Jack J. Dongarra,et al. A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.
[27] Yiqun Liu,et al. MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs , 2013, Journal of Computer Science and Technology.
[28] Simon D. Hammond,et al. Revisiting Online Autotuning for Sparse-Matrix Vector Multiplication Kernels on Next-Generation Architectures , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[29] Michel Steuwer,et al. LIFT: A functional data-parallel IR for high-performance GPU code generation , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[30] Chun Chen,et al. A Programming Language Interface to Describe Transformations and Code Generation , 2010, LCPC.
[31] Dominik Grewe,et al. Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation , 2011, GPGPU-4.
[32] Klaus Schulten,et al. Accelerating Molecular Modeling Applications with GPU Computing , 2009 .
[33] Chris Cummins,et al. End-to-End Deep Learning of Optimization Heuristics , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[34] Ananta Tiwari,et al. Online Adaptive Code Generation and Tuning , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[35] Sergei Gorlatch,et al. ATF: A generic directive‐based auto‐tuning framework , 2019, Concurr. Comput. Pract. Exp..
[36] Michael Garland,et al. Architecture-Adaptive Code Variant Tuning , 2016, ASPLOS.
[37] Karl Ljungkvist. Matrix-Free Finite-Element Operator Application on Graphics Processing Units , 2014, Euro-Par Workshops.
[38] Anne C. Elster,et al. Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[39] Matthew L. Baker,et al. An atomic model of brome mosaic virus using direct electron detection and real-space optimization , 2014, Nature Communications.
[40] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[41] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[42] Michael F. P. O'Boyle,et al. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2004, The Journal of Supercomputing.
[43] Jack J. Dongarra,et al. Autotuning GEMM Kernels for the Fermi GPU , 2012, IEEE Transactions on Parallel and Distributed Systems.
[44] Anna Sikora,et al. A multi-aspect online tuning framework for HPC applications , 2017, Software Quality Journal.
[45] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[46] Siegfried Benkner,et al. Automatic Performance Tuning of Pipeline Patterns for Heterogeneous Parallel Architectures , 2014 .