Accelerating Data Analytics on Integrated GPU Platforms via Runtime Specialization
暂无分享,去创建一个
Vanish Talwar | Karsten Schwan | Yuan Chen | Rajkishore Barik | Brian T. Lewis | Tatiana Shpeisman | Naila Farooqui | Indrajit Roy | R. Barik | V. Talwar | K. Schwan | Yuan Chen | Indrajit Roy | B. Lewis | N. Farooqui | T. Shpeisman
[1] David R. Kaeli,et al. Exploring the multiple-GPU design space , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[2] Rajkishore Barik,et al. Efficient Mapping of Irregular C++ Applications to Integrated GPUs , 2014, CGO '14.
[3] Michela Becchi,et al. Deploying Graph Algorithms on GPUs: An Adaptive Solution , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[4] Karthik Nilakant,et al. On the Efficacy of APUs for Heterogeneous Graph Computation , 2014 .
[5] Michael L. Scott,et al. Disengaged scheduling for fair, protected access to fast computational accelerators , 2014, ASPLOS.
[6] Jean-Philippe Martin,et al. Dandelion: a compiler and runtime for heterogeneous systems , 2013, SOSP.
[7] Shinpei Kato,et al. Gdev: First-Class GPU Resource Management in the Operating System , 2012, USENIX Annual Technical Conference.
[8] Albert Cohen,et al. Correct and efficient work-stealing for weak memory models , 2013, PPoPP '13.
[9] Yi Guo,et al. SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, IPDPS.
[10] Lei Wang,et al. An adaptive task creation strategy for work-stealing scheduling , 2010, CGO '10.
[11] Gagan Agrawal,et al. Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations , 2010, ICS '10.
[12] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[13] Vanish Talwar,et al. Evaluating integrated graphics processors for data center workloads , 2013, HotPower '13.
[14] Sudhakar Yalamanchili,et al. A characterization and analysis of PTX kernels , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[15] Mark Silberstein,et al. PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.
[16] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[17] David Defour,et al. Barra: A Parallel Functional Simulator for GPGPU , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.
[18] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.
[19] Sudhakar Yalamanchili,et al. Red Fox: An Execution Environment for Relational Query Processing on GPUs , 2014, CGO '14.
[20] Srimat T. Chakradhar,et al. Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework , 2011, HPDC '11.
[21] David Chase,et al. Dynamic circular work-stealing deque , 2005, SPAA '05.
[22] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[23] Wei Jiang,et al. Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).
[24] David Grove,et al. Work-stealing without the baggage , 2012, OOPSLA '12.
[25] Ling Liu,et al. Efficient data partitioning model for heterogeneous graphs in the cloud , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[26] Martin D. F. Wong,et al. An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.
[27] Bruno Raffin,et al. Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures , 2013 .
[28] Yuxiong He,et al. Adaptive work-stealing with parallelism feedback , 2008, TOCS.
[29] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[30] Vivek Sarkar,et al. Dynamic Task Parallelism with a GPU Work-Stealing Runtime System , 2011, LCPC.
[31] Bronis R. de Supinski,et al. Heterogeneous Task Scheduling for Accelerated OpenMP , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[32] Philippas Tsigas,et al. On dynamic load balancing on graphics processors , 2008, GH '08.
[33] Yu David Liu,et al. Energy-efficient work-stealing language runtimes , 2014, ASPLOS.
[34] Yao Zhang,et al. A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[35] Michael F. P. O'Boyle,et al. Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[36] Kunle Olukotun,et al. Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.
[37] Keshav Pingali,et al. Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[38] David Parello,et al. Barra, a Modular Functional GPU Simulator for GPGPU , 2009 .
[39] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[40] Tao Li,et al. Exploring GPGPU workloads: Characterization methodology, analysis and microarchitecture evaluation implications , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[41] Katherine Yelick,et al. Hierarchical Work Stealing on Manycore Clusters , 2011 .
[42] Srihari Cadambi,et al. Interference-driven resource management for GPU-based heterogeneous clusters , 2012, HPDC '12.
[43] Robert D. Blumofe,et al. Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.
[44] Karsten Schwan,et al. Lynx: A dynamic instrumentation system for data-parallel applications on GPGPU architectures , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
[45] Vanish Talwar,et al. Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems , 2011, USENIX ATC.
[46] Jungwon Kim,et al. Achieving a single compute device image in OpenCL for multiple GPUs , 2011, PPoPP '11.
[47] Scott A. Mahlke,et al. Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[48] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).
[49] Jason Cong,et al. Mapping a data-flow programming model onto heterogeneous platforms , 2012, LCTES '12.
[50] Grigori Fursin,et al. Predictive Runtime Code Scheduling for Heterogeneous Architectures , 2008, HiPEAC.
[51] Andrew E. Turner,et al. Visualizing complex dynamics in many-core accelerator architectures , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[52] Michael A. Bender,et al. Scheduling Cilk multithreaded parallel programs on processors of different speeds , 2000, SPAA.
[53] Long Chen,et al. Dynamic load balancing on single- and multi-GPU systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[54] Srimat T. Chakradhar,et al. A virtual memory based runtime to support multi-tenancy in clusters with GPUs , 2012, HPDC '12.
[55] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[56] Kevin Skadron,et al. Load balancing in a changing world: dealing with heterogeneity and performance variability , 2013, CF '13.
[57] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .
[58] Yi Guo,et al. SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[59] Keshav Pingali,et al. A lightweight infrastructure for graph analytics , 2013, SOSP.
[60] R. Govindarajan,et al. Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices , 2014, CGO '14.