Generic System Calls for GPUs
暂无分享,去创建一个
Ján Veselý | Mark Oskin | Gabriel H. Loh | Abhishek Bhattacharjee | Steven K. Reinhardt | Arkaprava Basu | M. Oskin | Arkaprava Basu | J. Veselý | A. Bhattacharjee | S. Reinhardt | G. Loh
[1] Brian N. Bershad,et al. Using continuations to implement thread management and communication in operating systems , 1991, SOSP '91.
[2] William J. Dally,et al. A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[3] William J. Dally,et al. Comparing Reyes and OpenGL on a stream architecture , 2002, HWWS '02.
[4] William J. Dally,et al. A stream processor development platform , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.
[5] Mark Oskin,et al. Using modern graphics architectures for general-purpose computing: a framework and analysis , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[6] William J. Dally,et al. Media processing applications on the Imagine stream processor , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.
[7] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[8] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[9] Matthew Might,et al. Continuations and transducer composition , 2006, PLDI '06.
[10] David Tarditi,et al. Accelerator: using data parallelism to program GPUs for general-purpose uses , 2006, ASPLOS XII.
[11] John D. Owens,et al. GPU-to-CPU Callbacks , 2010, Euro-Par Workshops.
[12] Asynchronous System Calls , 2010 .
[13] Atomic Operations , 2011, Encyclopedia of Parallel Computing.
[14] John D. Owens,et al. Extending MPI to accelerators , 2011, ASBD '11.
[15] Graham Sellers,et al. Virtual texturing in software and hardware , 2012, SIGGRAPH '12.
[16] Mike O'Connor,et al. Cache-Conscious Wavefront Scheduling , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[17] Architectural Support for Address Translation on GPUs , 2013 .
[18] Mike O'Connor,et al. Divergence-Aware Warp Scheduling , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19] David A. Wood,et al. Heterogeneous system coherence for integrated CPU-GPU systems , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[20] Mahmut T. Kandemir,et al. Neither more nor less: Optimizing thread-level parallelism for GPGPUs , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[21] Idit Keidar,et al. GPUfs: Integrating a file system with GPUs , 2013, TOCS.
[22] Mark Silberstein,et al. GPUnet , 2014, OSDI.
[23] Dhabaleswar K. Panda,et al. GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation , 2014, IEEE Transactions on Parallel and Distributed Systems.
[24] Lena Oden. Direct communication methods for distributed GPUs , 2014 .
[25] David A. Wood,et al. Supporting x86-64 address translation for 100s of GPU lanes , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[26] David A. Wood,et al. Border control: Sandboxing accelerators , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[27] Jeffrey S. Vetter,et al. A Survey of CPU-GPU Heterogeneous Computing Techniques , 2015, ACM Comput. Surv..
[28] Marc Snir,et al. MiniAMR - A miniapp for Adaptive Mesh Refinement , 2016 .
[29] Thomas F. Wenisch,et al. Selective GPU caches to eliminate CPU-GPU HW cache coherence , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[30] Ján Veselý,et al. Observations and opportunities in architecting shared virtual memory for heterogeneous systems , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[31] Ganesh Gopalakrishnan,et al. Portable inter-workgroup barrier synchronisation for GPUs , 2016, OOPSLA.
[32] Mahmut T. Kandemir,et al. Controlled Kernel Launch for Dynamic Parallelism in GPUs , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[33] Mark Silberstein,et al. SPIN: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUs , 2019, USENIX Annual Technical Conference.
[34] Abhishek Bhattacharjee,et al. Efficient Address Translation for Architectures with Multiple Page Sizes , 2017, ASPLOS.
[35] Tor M. Aamodt,et al. Warp Scheduling for Fine-Grained Synchronization , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[36] Mark Silberstein,et al. ActivePointers: A Case for Software Address Translation on GPUs , 2018, OPSR.
[37] Mohamed Ibrahim,et al. Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).