Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability
暂无分享,去创建一个
[1] Nick Johnson,et al. Input-aware auto-tuning for directive-based GPU programming , 2013, GPGPU@ASPLOS.
[2] Anne C. Elster,et al. Register Caching for Stencil Computations on GPUs , 2014, 2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.
[3] John D. Owens,et al. GPU Computing , 2008, Proceedings of the IEEE.
[4] J. Xu. OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .
[5] Dick H. J. Epema,et al. Towards Machine Learning-Based Auto-tuning of MapReduce , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.
[6] Margaret Martonosi,et al. Starchart: Hardware and software optimization using recursive partitioning regression trees , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[7] Christopher Dyken,et al. State-of-the-art in heterogeneous computing , 2010, Sci. Program..
[8] Frank Mueller,et al. Auto-generation and auto-tuning of 3D stencil codes on GPU clusters , 2012, CGO '12.
[9] Michael F. P. O'Boyle,et al. Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[10] Xipeng Shen,et al. A cross-input adaptive framework for GPU program optimizations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[11] David D. Cox,et al. Machine learning for predictive auto-tuning with boosted regression trees , 2012, 2012 Innovative Parallel Computing (InPar).
[12] Charles K. Bayne,et al. Multivariate Analysis of Quality. An Introduction , 2001 .
[13] Jack J. Dongarra,et al. A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.
[14] Basilio B. Fraguela,et al. OCLoptimizer: An Iterative Optimization Tool for OpenCL , 2013, ICCS.
[15] Sally A. McKee,et al. Predicting parallel application performance via machine learning approaches , 2007, Concurr. Comput. Pract. Exp..
[16] Thomas Fahringer,et al. Automatic problem size sensitive task partitioning on heterogeneous parallel systems , 2013, PPoPP '13.
[17] Sally A. McKee,et al. Machine learning based online performance prediction for runtime parallelization and task scheduling , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[18] Jan Christian Meyer,et al. Performance modeling of heterogeneous systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[19] Tarek S. Abdelrahman,et al. Automatic Tuning of Local Memory Use on GPGPUs , 2014, ArXiv.
[20] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[21] Mark Stephenson,et al. Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.
[22] Satoshi Matsuoka,et al. Auto-tuning 3-D FFT library for CUDA GPUs , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[23] R. C. Whaley,et al. Minimizing development and maintenance costs in supporting persistently optimized BLAS , 2005, Softw. Pract. Exp..
[24] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[25] Hermann Lederer,et al. Parallel Computing: From Multicores and GPU's to Petascale , 2010 .
[26] Donggang Liu,et al. Combating side-channel attacks using key management , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[27] Eiji Yamanaka,et al. Predicting Vectorization Profitability Using Binary Classification , 2014, IEICE Trans. Inf. Syst..
[28] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .
[29] Sameer Kulkarni,et al. Mitigating the compiler optimization phase-ordering problem using machine learning , 2012, OOPSLA '12.
[30] Anne C. Elster,et al. Auto-tuning a Matrix Routine for High Performance , 2011 .
[31] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[32] Michael F. P. O'Boyle,et al. Milepost GCC: Machine Learning Enabled Self-tuning Compiler , 2011, International Journal of Parallel Programming.
[33] Yao Zhang,et al. Improving Performance Portability in OpenCL Programs , 2013, ISC.
[34] Jack Dongarra,et al. Special Issue on Program Generation, Optimization, and Platform Adaptation , 2005, Proc. IEEE.
[35] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[36] Stephen A. Jarvis,et al. An investigation of the performance portability of OpenCL , 2013, J. Parallel Distributed Comput..
[37] Anne C. Elster,et al. Modelling Multi-GPU Systems , 2009, PARCO.
[38] Jan Christian Meyer,et al. A super-efficient adaptable bit-reversal algorithm for multithreaded architectures , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[39] Michael F. P. O'Boyle,et al. Automatic optimization of thread-coarsening for graphics processors , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[40] Zheng Wang,et al. Active learning accelerated automatic heuristic construction for parallel program mapping , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[41] Frank Lindseth,et al. Medical image segmentation on GPUs - A comprehensive review , 2015, Medical Image Anal..
[42] Wei Tang,et al. Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..