An Analytical Study of Recursive Tree Traversal Patterns on Multi- and Many-Core Platforms
暂无分享,去创建一个
[1] Michela Becchi,et al. Deploying Graph Algorithms on GPUs: An Adaptive Solution , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[2] Guoyang Chen,et al. Free launch: Optimizing GPU dynamic kernel launches through thread reuse , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[3] P. J. Narayanan,et al. Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.
[4] Kunle Olukotun,et al. Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.
[5] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[6] Ümit V. Çatalyürek,et al. Betweenness centrality on GPUs and heterogeneous architectures , 2013, GPGPU@ASPLOS.
[7] Michael Goldfarb,et al. General transformations for GPU execution of tree traversals , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[8] Wu-chun Feng,et al. Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[9] Michela Becchi,et al. Nested Parallelism on GPU: Exploring Parallelization Templates for Irregular Loops and Recursive Computations , 2015, 2015 44th International Conference on Parallel Processing.
[10] Mehmet Deveci,et al. Parallel Graph Coloring for Manycore Architectures , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[11] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[12] Michael Garland,et al. Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[13] Xiangke Liao,et al. RegTT: Accelerating Tree Traversals on GPUs by Exploiting Regularities , 2016, 2016 45th International Conference on Parallel Processing (ICPP).
[14] Sudhakar Yalamanchili,et al. Characterization and analysis of dynamic parallelism in unstructured GPU applications , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).
[15] Keshav Pingali,et al. Atomic-free irregular computations on GPUs , 2013, GPGPU@ASPLOS.
[16] Kevin Skadron,et al. Pannotia: Understanding irregular GPGPU graph applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).
[17] Yi Yang,et al. CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications , 2015, Journal of Computer Science and Technology.
[18] Srimat T. Chakradhar,et al. GRapid: A compilation and runtime framework for rapid prototyping of graph applications on many-core processors , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).
[19] Martin D. F. Wong,et al. An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.
[20] Suresh Venkatasubramanian,et al. Evaluating graph coloring on GPUs , 2011, PPoPP '11.
[21] Michela Becchi,et al. Compiler-Assisted Workload Consolidation for Efficient Dynamic Parallelism on GPU , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[22] Matei Ripeanu,et al. On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[23] John D. Owens,et al. Performance Characterization of High-Level Programming Models for GPU Graph Analytics , 2015, 2015 IEEE International Symposium on Workload Characterization.
[24] Keshav Pingali,et al. Data-Driven Versus Topology-driven Irregular Computations on GPUs , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.