Moka: Model-based concurrent kernel analysis
暂无分享,去创建一个
[1] David W. Nellans,et al. Flexible software profiling of GPU architectures , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[2] T. Steinke,et al. On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.
[3] Jason Maassen,et al. Performance Models for CPU-GPU Data Transfers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[4] David A Boas,et al. Monte Carlo simulation of photon migration in 3D turbid media accelerated by graphics processing units. , 2009, Optics express.
[5] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[6] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[7] Scott A. Mahlke,et al. Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.
[8] Tarek A. El-Ghazawi,et al. Exploiting concurrent kernel execution on graphic processing units , 2011, 2011 International Conference on High Performance Computing & Simulation.
[9] Xiangyu Li,et al. Mystic: Predictive Scheduling for GPU Based Cloud Servers Using Machine Learning , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[10] Wu-chun Feng,et al. Power and Performance Characterization of Computational Kernels on the GPU , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.
[11] KimHyesoon,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009 .
[12] William J. Dally,et al. The GPU Computing Era , 2010, IEEE Micro.
[13] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[14] Guoyang Chen,et al. A software framework for efficient preemptive scheduling on GPU , 2016 .
[15] Yun Liang,et al. Efficient GPU Spatial-Temporal Multitasking , 2015, IEEE Transactions on Parallel and Distributed Systems.
[16] David R. Kaeli,et al. GPU-Accelerated HMM for Speech Recognition , 2014, 2014 43rd International Conference on Parallel Processing Workshops.
[17] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[18] Tulika Mitra,et al. Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[19] Xinxin Mei,et al. Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.
[20] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[21] Zhongliang Chen,et al. TwinKernels: An execution model to improve GPU hardware scheduling at compile time , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[22] Kevin Skadron,et al. Fine-grained resource sharing for concurrent GPGPU kernels , 2012, HotPar'12.
[23] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[24] Thomas Steinke,et al. Multi-threaded Kernel Offloading to GPGPU Using Hyper-Q on Kepler Architecture , 2014 .
[25] David R. Kaeli,et al. Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[26] Mahmut T. Kandemir,et al. Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications , 2014, GPGPU@ASPLOS.
[27] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[28] Sudhakar Yalamanchili,et al. A characterization and analysis of PTX kernels , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[29] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.
[30] Yuri Torres,et al. Understanding the impact of CUDA tuning techniques for Fermi , 2011, 2011 International Conference on High Performance Computing & Simulation.