Improving Auto-Tuning Convergence Times with Dynamically Generated Predictive Performance Models
暂无分享,去创建一个
[1] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[2] Frédo Durand,et al. Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..
[3] Jianbin Fang,et al. An Auto-tuning Solution to Data Streams Clustering in OpenCL , 2011, 2011 14th IEEE International Conference on Computational Science and Engineering.
[4] José M. F. Moura,et al. Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms , 2004, Int. J. High Perform. Comput. Appl..
[5] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..
[6] Margaret Martonosi,et al. Starchart: Hardware and software optimization using recursive partitioning regression trees , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[7] Jack J. Dongarra,et al. A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.
[8] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[9] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[10] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .
[11] Yiqun Liu,et al. MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs , 2013, Journal of Computer Science and Technology.
[12] Chun Chen,et al. Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology , 2010, Software Automatic Tuning, From Concepts to State-of-the-Art Results.
[13] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[14] Christopher J. Fluke,et al. Accelerating incoherent dedispersion , 2012, 1201.5380.
[15] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[16] Margaret Martonosi,et al. Stargazer: Automated regression-based GPU design space exploration , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
[17] David D. Cox,et al. Machine learning for predictive auto-tuning with boosted regression trees , 2012, 2012 Innovative Parallel Computing (InPar).
[18] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[19] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[20] Stanislav G. Sedukhin,et al. Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[21] Prasanna Balaprakash,et al. Active-learning-based surrogate models for empirical performance tuning , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).
[22] Michael F. P. O'Boyle,et al. A large-scale cross-architecture evaluation of thread-coarsening , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[23] Katherine A. Yelick,et al. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.