Memory Aware Thread Aggregation Framework for Dynamic Parallelism in GPUs
暂无分享,去创建一个
[1] Andreas Polze,et al. Using Dynamic Parallelism for Fine-Grained, Irregular Workloads: A Case Study of the N-Queens Problem , 2015, 2015 Third International Symposium on Computing and Networking (CANDAR).
[2] Sudhakar Yalamanchili,et al. Characterization and analysis of dynamic parallelism in unstructured GPU applications , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).
[3] Jin Wang,et al. Dynamic Thread Block Launch: A lightweight execution mechanism to support irregular applications on GPUs , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[4] Kevin Skadron,et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[5] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[6] Michela Becchi,et al. Compiler-Assisted Workload Consolidation for Efficient Dynamic Parallelism on GPU , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[7] Michela Becchi,et al. Nested Parallelism on GPU: Exploring Parallelization Templates for Irregular Loops and Recursive Computations , 2015, 2015 44th International Conference on Parallel Processing.
[8] Guoyang Chen,et al. Free launch: Optimizing GPU dynamic kernel launches through thread reuse , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[9] Dejan S. Milojicic,et al. KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).