暂无分享,去创建一个
Ján Veselý | Mark Oskin | Gabriel H. Loh | Abhishek Bhattacharjee | Steven K. Reinhardt | Arkaprava Basu
[1] Sudhakar Yalamanchili,et al. Coordinated energy management in heterogeneous processors , 2014, Sci. Program..
[2] Thomas F. Wenisch,et al. Selective GPU caches to eliminate CPU-GPU HW cache coherence , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[3] William J. Dally,et al. A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[4] Graham Sellers,et al. Virtual texturing in software and hardware , 2012, SIGGRAPH '12.
[5] Mark Oskin,et al. Using modern graphics architectures for general-purpose computing: a framework and analysis , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[6] Asynchronous System Calls , 2010 .
[7] David A. Wood,et al. Border control: Sandboxing accelerators , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[8] Mark Silberstein,et al. GPUnet , 2014, OSDI.
[9] William J. Dally,et al. Comparing Reyes and OpenGL on a stream architecture , 2002, HWWS '02.
[10] Abhishek Bhattacharjee,et al. Efficient Address Translation for Architectures with Multiple Page Sizes , 2017, ASPLOS.
[11] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[12] Matthew Might,et al. Continuations and transducer composition , 2006, PLDI '06.
[13] Brian N. Bershad,et al. Using continuations to implement thread management and communication in operating systems , 1991, SOSP '91.
[14] Abhishek Bhattacharjee,et al. Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces , 2013, ASPLOS.
[15] Idit Keidar,et al. GPUfs: Integrating a file system with GPUs , 2013, TOCS.
[16] Mark Silberstein,et al. ActivePointers: A Case for Software Address Translation on GPUs , 2018, OPSR.
[17] Dhabaleswar K. Panda,et al. GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation , 2014, IEEE Transactions on Parallel and Distributed Systems.
[18] David Tarditi,et al. Accelerator: using data parallelism to program GPUs for general-purpose uses , 2006, ASPLOS XII.
[19] Marc Snir,et al. MiniAMR - A miniapp for Adaptive Mesh Refinement , 2016 .
[20] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[21] Ján Veselý,et al. Observations and opportunities in architecting shared virtual memory for heterogeneous systems , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[22] David A. Wood,et al. Supporting x86-64 address translation for 100s of GPU lanes , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[23] John D. Owens,et al. Extending MPI to accelerators , 2011, ASBD '11.
[24] William J. Dally,et al. A stream processor development platform , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.
[25] Jeffrey S. Vetter,et al. A Survey of CPU-GPU Heterogeneous Computing Techniques , 2015, ACM Comput. Surv..
[26] David A. Wood,et al. Heterogeneous system coherence for integrated CPU-GPU systems , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[27] Lena Oden. Direct communication methods for distributed GPUs , 2014 .