Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling
暂无分享,去创建一个
[1] Vivek Sarkar,et al. Linear scan register allocation , 1999, TOPL.
[2] J. Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Federico Silla,et al. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters , 2010, 2010 International Conference on High Performance Computing & Simulation.
[4] Jie Chen,et al. Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[5] Kevin Skadron,et al. Enabling Task Parallelism in the CUDA Scheduler , 2009 .
[6] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.
[7] Bingsheng He,et al. Relational query coprocessing on graphics processors , 2009, TODS.
[8] Stijn Eyerman,et al. Probabilistic job symbiosis modeling for SMT processor scheduling , 2010, ASPLOS XV.
[9] S BaghsorkhiSara,et al. Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors , 2012 .
[10] Norbert Luttenberger,et al. Efficiently Using a CUDA-enabled GPU as Shared Resource , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.
[11] Jianlong Zhong,et al. Parallel Graph Processing on Graphics Processors Made Easy , 2013, Proc. VLDB Endow..
[12] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.
[13] Vanish Talwar,et al. GViM: GPU-accelerated virtual machines , 2009, HPCVirt '09.
[14] Michael Stumm,et al. Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , 2007, EuroSys '07.
[15] Wen-mei W. Hwu,et al. Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors , 2012, PPoPP '12.
[16] Michael Garland,et al. Understanding throughput-oriented architectures , 2010, Commun. ACM.
[17] Stijn Eyerman,et al. Probabilistic job symbiosis modeling for SMT processor scheduling , 2010, ASPLOS 2010.
[18] Dinesh Manocha,et al. GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.
[19] Burton J. Smith,et al. High performance discrete Fourier transforms on graphics processors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] Jianlong Zhong,et al. Medusa: Simplified Graph Processing on GPUs , 2014, IEEE Transactions on Parallel and Distributed Systems.
[21] Bandwidth intensive 3-D FFT kernel for GPUs using CUDA , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] David I. August,et al. Automatic CPU-GPU communication management and optimization , 2011, PLDI '11.
[23] Kevin Skadron,et al. Fine-grained resource sharing for concurrent GPGPU kernels , 2012, HotPar'12.
[24] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[25] Dean M. Tullsen,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.
[26] William Stallings,et al. Operating Systems - Internals and Design Principles (7th ed.) , 2001 .
[27] Bingsheng He,et al. Mars: Accelerating MapReduce with Graphics Processors , 2011, IEEE Transactions on Parallel and Distributed Systems.
[28] Dean M. Tullsen,et al. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.
[29] A. Snavely,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.
[30] Dirk Grunwald,et al. Methods for modeling resource contention on simultaneous multithreading processors , 2005, 2005 International Conference on Computer Design.
[31] William Stallings,et al. Operating Systems: Internals and Design Principles , 1991 .
[32] Mark Silberstein,et al. PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.
[33] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[34] Bingsheng He,et al. Relational joins on graphics processors , 2008, SIGMOD Conference.
[35] Richard W. Vuduc,et al. A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.
[36] Susan J. Eggers,et al. Thread-Sensitive Scheduling for SMT Processors , 2000 .
[37] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[38] Mauricio J. Serrano,et al. A Model for Performance Estimation in a Multistreamed Superscalar Processor , 1994, Computer Performance Evaluation.
[39] T. Steinke,et al. On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.
[40] Jack J. Dongarra,et al. Optimizing symmetric dense matrix-vector multiplication on GPUs , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[41] M. J. Serrano. Performance estimation in a simultaneous multithreading processor , 1996, Proceedings of MASCOTS '96 - 4th International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.
[42] Shinpei Kato,et al. RGEM: A Responsive GPGPU Execution Model for Runtime Engines , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.
[43] Yao Zhang,et al. A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[44] Tor M. Aamodt,et al. A first-order fine-grained multithreaded throughput model , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[45] Gregory J. Chaitin,et al. Register allocation & spilling via graph coloring , 1982, SIGPLAN '82.
[46] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[47] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.
[48] Srimat T. Chakradhar,et al. Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework , 2011, HPDC '11.
[49] Vikram K. Narayana,et al. GPU Resource Sharing and Virtualization on High Performance Computing Systems , 2011, 2011 International Conference on Parallel Processing.