EDGE: Event-Driven GPU Execution
暂无分享,去创建一个
[1] Mark Silberstein,et al. PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.
[2] Mark Silberstein,et al. GPUnet , 2014, OSDI.
[3] Sudhakar Yalamanchili,et al. Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[4] Stijn Eyerman,et al. System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.
[5] Rudolf Eigenmann,et al. Pagoda: Fine-Grained GPU Resource Virtualization for Narrow Tasks , 2017, PPOPP.
[6] Mateo Valero,et al. Enabling preemptive multiprogramming on GPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[7] Onur Mutlu,et al. Zorua: A holistic approach to resource virtualization in GPUs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[8] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[9] Changjun Jiang,et al. FLEP: Enabling Flexible and Efficient Preemption on GPUs , 2017, ASPLOS.
[10] Won Woo Ro,et al. Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[11] Yusuke Suzuki. Towards Multi-tenant GPGPU : Event-driven Programming Model for System-wide Scheduling on Shared GPUs , 2016 .
[12] Jin Wang,et al. Dynamic Thread Block Launch: A lightweight execution mechanism to support irregular applications on GPUs , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[13] Chen Sun,et al. Grus: Enabling Latency SLOs for GPU-Accelerated NFV Systems , 2018, 2018 IEEE 26th International Conference on Network Protocols (ICNP).
[14] Luiz André Barroso,et al. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.
[15] Steven Swanson,et al. DC express: shortest latency protocol for reading phase change memory over PCI express , 2014, FAST.
[16] Sue B. Moon,et al. NBA (network balancing act): a high-performance packet processing framework for heterogeneous processors , 2015, EuroSys.
[17] Tom R. Halfhill. NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .
[18] Andrew W. Moore,et al. Motivating future interconnects: a differential measurement analysis of PCI latency , 2009, ANCS '09.
[19] Yue Zhao,et al. EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU , 2017, PPoPP.
[20] Seth Copen Goldstein,et al. Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[21] Lizy Kurian John,et al. Extended Task Queuing: Active Messages for Heterogeneous Systems , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] Shinpei Kato,et al. GPUvm: GPU Virtualization at the Hypervisor , 2016, IEEE Transactions on Computers.
[23] Sotiris Ioannidis,et al. GASPP: A GPU-Accelerated Stateful Packet Processing Framework , 2014, USENIX Annual Technical Conference.
[24] Wendong Hu,et al. NetBench: a benchmarking suite for network processors , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).
[25] KyoungSoo Park,et al. APUNet: Revitalizing GPU as Packet Processing Accelerator , 2017, NSDI.
[26] K. Steinhubl. Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .
[27] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[28] Enhong Chen,et al. KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC , 2017, SOSP.
[29] Tao Li,et al. Enabling Efficient Network Service Function Chain Deployment on Heterogeneous Server Platform , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[30] Jun Pang,et al. Rhythm: harnessing data parallel hardware for server workloads , 2014, ASPLOS.
[31] Shinpei Kato,et al. Gdev: First-Class GPU Resource Management in the Operating System , 2012, USENIX Annual Technical Conference.
[32] Tor M. Aamodt,et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[33] Shinpei Kato,et al. RGEM: A Responsive GPGPU Execution Model for Runtime Engines , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.
[34] Mendel Rosenblum,et al. Network Interface Design for Low Latency Request-Response Protocols , 2013, USENIX ATC.
[35] Sangjin Han,et al. PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.
[36] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[37] Mike O'Connor,et al. MemcachedGPU: scaling-up scale-out key-value stores , 2015, SoCC.
[38] Kevin Skadron,et al. Accelerating SQL database operations on a GPU with CUDA , 2010, GPGPU-3.
[39] Karthikeyan Sankaralingam,et al. iGPU: Exception support and speculative execution on GPUs , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[40] David A. Wood,et al. gem5-gpu: A Heterogeneous CPU-GPU Simulator , 2015, IEEE Computer Architecture Letters.
[41] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.
[42] Jeff A. Stuart,et al. A study of Persistent Threads style GPU programming for GPGPU workloads , 2012, 2012 Innovative Parallel Computing (InPar).
[43] Eduard Ayguadé,et al. Efficient Exception Handling Support for GPUs , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[44] Sudhakar Yalamanchili,et al. Characterization and analysis of dynamic parallelism in unstructured GPU applications , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).
[45] Ajay Jain,et al. Dynamic Space-Time Scheduling for GPU Inference , 2018, ArXiv.
[46] Idit Keidar,et al. GPUfs: Integrating a file system with GPUs , 2013, TOCS.
[47] Kai Yu,et al. Large-scale deep learning at Baidu , 2013, CIKM.
[48] Scott A. Mahlke,et al. Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.
[49] Rami G. Melhem,et al. Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[50] Chia-Lin Yang,et al. Enabling fast preemption via Dual-Kernel support on GPUs , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).
[51] Avi Mendelson,et al. GPUpIO: the case for I/O-driven preemption on GPUs , 2016, GPGPU@PPoPP.
[52] Mark Silberstein,et al. GPUrdma: GPU-side library for high performance networking from GPU kernels , 2016, ROSS@HPDC.
[53] Shinpei Kato,et al. Operating Systems Challenges for GPU Resource Management , 2011 .
[54] Dong Zhou,et al. Raising the Bar for Using GPUs in Software Packet Processing , 2015, NSDI.