Accelerating graph applications on integrated GPU platforms via instrumentation-driven optimizations
暂无分享,去创建一个
Vanish Talwar | Karsten Schwan | Yuan Chen | Naila Farooqui | Indrajit Roy | V. Talwar | K. Schwan | Yuan Chen | Indrajit Roy | N. Farooqui
[1] Rajkishore Barik,et al. Efficient Mapping of Irregular C++ Applications to Integrated GPUs , 2014, CGO '14.
[2] Shinpei Kato,et al. Gdev: First-Class GPU Resource Management in the Operating System , 2012, USENIX Annual Technical Conference.
[3] Karthik Nilakant,et al. On the Efficacy of APUs for Heterogeneous Graph Computation , 2014 .
[4] Michael L. Scott,et al. Disengaged scheduling for fair, protected access to fast computational accelerators , 2014, ASPLOS.
[5] Jean-Philippe Martin,et al. Dandelion: a compiler and runtime for heterogeneous systems , 2013, SOSP.
[6] Michela Becchi,et al. Deploying Graph Algorithms on GPUs: An Adaptive Solution , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[7] Keshav Pingali,et al. Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[8] Sudhakar Yalamanchili,et al. A characterization and analysis of PTX kernels , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[9] Mark Silberstein,et al. PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.
[10] Ling Liu,et al. Efficient data partitioning model for heterogeneous graphs in the cloud , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[11] Srihari Cadambi,et al. Interference-driven resource management for GPU-based heterogeneous clusters , 2012, HPDC '12.
[12] David Defour,et al. Barra: A Parallel Functional Simulator for GPGPU , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.
[13] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[14] Wei Jiang,et al. Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).
[15] Vanish Talwar,et al. Evaluating integrated graphics processors for data center workloads , 2013, HotPower '13.
[16] Yao Zhang,et al. A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[17] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[18] Srimat T. Chakradhar,et al. A virtual memory based runtime to support multi-tenancy in clusters with GPUs , 2012, HPDC '12.
[19] Kunle Olukotun,et al. Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.
[20] David Parello,et al. Barra, a Modular Functional GPU Simulator for GPGPU , 2009 .
[21] Kevin Skadron,et al. Load balancing in a changing world: dealing with heterogeneity and performance variability , 2013, CF '13.
[22] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .
[23] Saman P. Amarasinghe,et al. Portable performance on heterogeneous architectures , 2013, ASPLOS '13.
[24] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[25] Grigori Fursin,et al. Predictive Runtime Code Scheduling for Heterogeneous Architectures , 2008, HiPEAC.
[26] Andrew E. Turner,et al. Visualizing complex dynamics in many-core accelerator architectures , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[27] Keshav Pingali,et al. A lightweight infrastructure for graph analytics , 2013, SOSP.
[28] Srimat T. Chakradhar,et al. Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework , 2011, HPDC '11.
[29] Tao Li,et al. Exploring GPGPU workloads: Characterization methodology, analysis and microarchitecture evaluation implications , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[30] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[31] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.
[32] Sudhakar Yalamanchili,et al. Red Fox: An Execution Environment for Relational Query Processing on GPUs , 2014, CGO '14.
[33] Karsten Schwan,et al. Lynx: A dynamic instrumentation system for data-parallel applications on GPGPU architectures , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
[34] Kevin Skadron,et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[35] Martin D. F. Wong,et al. An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.
[36] Vanish Talwar,et al. Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems , 2011, USENIX ATC.