GPU Performance and Power Tuning Using Regression Trees
暂无分享,去创建一个
[1] Kapil Vaswani,et al. Construction and use of linear regression models for processor performance analysis , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[2] R Core Team,et al. R: A language and environment for statistical computing. , 2014 .
[3] Bin Li,et al. Tree structured analysis on GPU power study , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).
[4] David I. August,et al. Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[5] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[6] Ronald L. Rivest,et al. Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..
[7] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Grey Giddins,et al. Statistics , 2016, The Journal of hand surgery, European volume.
[9] L. Leemis. Applied Linear Regression Models , 1991 .
[10] Hyesoon Kim,et al. An integrated GPU power and performance model , 2010, ISCA.
[11] Burr Settles,et al. Active Learning Literature Survey , 2009 .
[12] Walter D. Fisher. On Grouping for Maximum Homogeneity , 1958 .
[13] David D. Cox,et al. Machine learning for predictive auto-tuning with boosted regression trees , 2012, 2012 Innovative Parallel Computing (InPar).
[14] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[15] Naga K. Govindaraju,et al. Auto-tuning of fast fourier transform on graphics processors , 2011, PPoPP '11.
[16] Yuri Torres,et al. Understanding the impact of CUDA tuning techniques for Fermi , 2011, 2011 International Conference on High Performance Computing & Simulation.
[17] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[18] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..
[19] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[20] Steven Salzberg,et al. Decision Tree Induction: How Effective is the Greedy Heuristic? , 1995, KDD.
[21] Margaret Martonosi,et al. Stargazer: Automated regression-based GPU design space exploration , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
[22] Tom R. Halfhill. NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .
[23] David M. Brooks,et al. Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.
[24] Olivier Temam,et al. Collective optimization: A practical collaborative approach , 2010, TACO.
[25] Wei-Yin Loh,et al. Classification and Regression Tree Methods , 2008 .
[26] Wen-mei W. Hwu,et al. Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.
[27] TUNING CUDA APPLICATIONS FOR KEPLER , 2017 .
[28] Jack J. Dongarra,et al. A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.
[29] Satoshi Matsuoka,et al. Statistical power modeling of GPU kernels using performance counters , 2010, International Conference on Green Computing.
[30] Richard W. Vuduc,et al. A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.
[31] Archana Ganapathi,et al. A case for machine learning to optimize multicore performance , 2009 .
[32] Margaret Martonosi,et al. Starchart: Hardware and software optimization using recursive partitioning regression trees , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[33] Michael F. P. O'Boyle,et al. A large-scale cross-architecture evaluation of thread-coarsening , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).