GPUnet
暂无分享,去创建一个
Mark Silberstein | Emmett Witchel | Xinya Zhang | Yige Hu | Seonggu Huh | Amir Wated | Sangman Kim | M. Silberstein | E. Witchel | Sangman Kim | Yige Hu | Seonggu Huh | Xinya Zhang | Amir Wated | Emmett Witchel
[1] Justin Talbot,et al. Phoenix++: modular MapReduce for shared-memory systems , 2011, MapReduce '11.
[2] Dhabaleswar K. Panda,et al. MVAPICH-PRISM: A proxy-based communication framework using InfiniBand and SCIF for Intel MIC clusters , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[3] J. Xu. OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .
[4] Eddie Kohler,et al. Events Can Make Sense , 2007, USENIX Annual Technical Conference.
[5] Mark Silberstein,et al. PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.
[6] Shinpei Kato,et al. Zero-copy I/O processing for low-latency GPU computing , 2013, 2013 ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS).
[7] Idit Keidar,et al. GPUfs: Integrating a file system with GPUs , 2013, TOCS.
[8] Ozalp Babaoglu,et al. ACM Transactions on Computer Systems , 2007 .
[9] Davide Rossetti,et al. APEnet+: a 3D Torus network optimized for GPU-based HPC Systems , 2012 .
[10] Zhen Wang,et al. K2 , 2015, False Summit.
[11] Parag Agrawal,et al. The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.
[12] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[13] Amin Vahdat,et al. Themis: an I/O-efficient MapReduce , 2012, SoCC '12.
[14] W. Richard Stevens,et al. Unix network programming , 1990, CCRV.
[15] Avi Mendelson,et al. GPUpIO: the case for I/O-driven preemption on GPUs , 2016, GPGPU@PPoPP.
[16] Mark Silberstein,et al. GPUrdma: GPU-side library for high performance networking from GPU kernels , 2016, ROSS@HPDC.
[17] Byung-Gon Chun,et al. Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 135 Megapipe: a New Programming Interface for Scalable Network I/o , 2022 .
[18] Feng Ji,et al. RSVM: A Region-based Software Virtual Memory for GPU , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[19] Sangjin Han,et al. PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.
[20] John D. Owens,et al. Multi-GPU MapReduce on GPU Clusters , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[21] W. Richard Stevens,et al. TCP/IP Illustrated, Volume 1: The Protocols , 1994 .
[22] Christopher R. Johnson,et al. PIKA: A Network Service for Multikernel Operating Systems , 2014 .
[23] Sotiris Ioannidis,et al. GASPP: A GPU-Accelerated Stateful Packet Processing Framework , 2014, USENIX Annual Technical Conference.
[24] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..
[25] Robert Ricci,et al. Fast and flexible: Parallel packet processing with GPUs and click , 2013, Architectures for Networking and Communications Systems.
[26] David A. Maltz,et al. Network traffic characteristics of data centers in the wild , 2010, IMC '10.
[27] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.
[28] Adrian Schüpbach,et al. The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.
[29] George C. Necula,et al. Capriccio: scalable threads for internet services , 2003, SOSP '03.
[30] Bryan Ford,et al. Structured streams: a new transport abstraction , 2007, SIGCOMM '07.
[31] David E. Culler,et al. SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.
[32] Seungyeop Han,et al. SSLShader: Cheap SSL Acceleration with Commodity Processors , 2011, NSDI.
[33] David G. Andersen,et al. Using vector interfaces to deliver millions of IOPS from a networked key-value storage server , 2012, SoCC '12.
[34] Matti Pietikäinen,et al. Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[35] Dhabaleswar K. Panda,et al. Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs , 2013, 2013 42nd International Conference on Parallel Processing.
[36] Thomas R. Gross,et al. On limitations of network acceleration , 2013, CoNEXT.
[37] Dhabaleswar K. Panda,et al. High performance RDMA-based MPI implementation over InfiniBand , 2003, ICS.
[38] Jun Pang,et al. Rhythm: harnessing data parallel hardware for server workloads , 2014, ASPLOS.
[39] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[40] Muli Ben-Yehuda,et al. IsoStack - Highly Efficient Network Processing on Dedicated Cores , 2010, USENIX Annual Technical Conference.
[41] Jean-Philippe Martin,et al. Dandelion: a compiler and runtime for heterogeneous systems , 2013, SOSP.
[42] Idit Keidar,et al. GPUfs: integrating a file system with GPUs , 2014, ASPLOS '13.