PALMOS: A Transparent, Multi-tasking Acceleration Layer for Parallel Heterogeneous Systems
暂无分享,去创建一个
[1] Peter Druschel,et al. Resource containers: a new facility for resource management in server systems , 1999, OSDI '99.
[2] Michael L. Scott,et al. Disengaged scheduling for fair, protected access to fast computational accelerators , 2014, ASPLOS.
[3] Collin McCurdy,et al. Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[4] Rajkishore Barik,et al. Efficient Mapping of Irregular C++ Applications to Integrated GPUs , 2014, CGO '14.
[5] Saman P. Amarasinghe,et al. Portable performance on heterogeneous architectures , 2013, ASPLOS '13.
[6] Srimat T. Chakradhar,et al. A virtual memory based runtime to support multi-tenancy in clusters with GPUs , 2012, HPDC '12.
[7] Adrian Schüpbach,et al. The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.
[8] Serge E. Hallyn,et al. Linux capabilities: making them work , 2008 .
[9] Kathryn S. McKinley,et al. Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.
[10] Vivien Quéma,et al. Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.
[11] Bowen Alpern,et al. PDS: a virtual execution environment for software deployment , 2005, VEE '05.
[12] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.
[13] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[14] Lingjia Tang,et al. Whare-map: heterogeneity in "homogeneous" warehouse-scale computers , 2013, ISCA.
[15] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.
[16] John Kubiatowicz,et al. Tessellation: Refactoring the OS around explicit resource containers with continuous adaptation , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[17] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[18] Thomas Fahringer,et al. LibWater: heterogeneous distributed computing made easy , 2013, ICS '13.
[19] Larry L. Peterson,et al. Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors , 2007, EuroSys '07.
[20] A. Kivity,et al. kvm : the Linux Virtual Machine Monitor , 2007 .
[21] Jungwon Kim,et al. Achieving a single compute device image in OpenCL for multiple GPUs , 2011, PPoPP '11.
[22] Scott A. Mahlke,et al. Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[23] Xiaoyun Zhu,et al. Capacity and Performance Overhead in Dynamic Resource Allocation to Virtual Containers , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.
[24] Jeffrey S. Vetter,et al. Quantifying NUMA and contention effects in multi-GPU systems , 2011, GPGPU-4.
[25] Nam Sung Kim,et al. The case for GPGPU spatial multitasking , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[26] Massimiliano Fatica,et al. Multi-GPU Programming , 2014 .
[27] Marianne Shaw,et al. Scale and performance in the Denali isolation kernel , 2002, OSDI '02.
[28] Federico Silla,et al. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters , 2010, 2010 International Conference on High Performance Computing & Simulation.
[29] Wu-chun Feng,et al. VOCL: An optimized environment for transparent virtualization of graphics processing units , 2012, 2012 Innovative Parallel Computing (InPar).
[30] Andrew Birrell,et al. Implementing remote procedure calls , 1984, TOCS.
[31] Kathryn S. McKinley,et al. Composing high-performance memory allocators , 2001, PLDI '01.
[32] Feng Ji,et al. RSVM: A Region-based Software Virtual Memory for GPU , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[33] P. Menage. Adding Generic Process Containers to the Linux Kernel , 2010 .
[34] Dejan S. Milojicic,et al. Exploring the performance and mapping of HPC applications to platforms in the cloud , 2012, HPDC '12.
[35] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.
[36] Kevin Skadron,et al. Enabling Task Parallelism in the CUDA Scheduler , 2009 .