Function Call Re-Vectorization
暂无分享,去创建一个
[1] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.
[2] Sudhakar Yalamanchili,et al. Characterization and analysis of dynamic parallelism in unstructured GPU applications , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).
[3] Guoyang Chen,et al. Free launch: Optimizing GPU dynamic kernel launches through thread reuse , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Donald E. Knuth,et al. Fast Pattern Matching in Strings , 1977, SIAM J. Comput..
[5] Clifford Stein,et al. Introduction to Algorithms, 2nd edition. , 2001 .
[6] Sebastian Hack,et al. The Impact of the SIMD Width on Control-Flow and Memory Divergence , 2014, ACM Trans. Archit. Code Optim..
[7] Yi Yang,et al. CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications , 2015, Journal of Computer Science and Technology.
[8] Alan M. Frieze,et al. Random graphs , 2006, SODA '06.
[9] Jin Wang,et al. Dynamic Thread Block Launch: A lightweight execution mechanism to support irregular applications on GPUs , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[10] Laxmi N. Bhuyan,et al. Efficient warp execution in presence of divergence with collaborative context collection , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[11] Sudhakar Yalamanchili,et al. LaPerm: Locality Aware Scheduler for Dynamic Parallelism on GPUs , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[12] Yen-Chen Liu,et al. Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.
[13] William J. Dally,et al. The GPU Computing Era , 2010, IEEE Micro.
[14] Fernando Magno Quintão Pereira,et al. Divergence analysis , 2013, ACM Trans. Program. Lang. Syst..
[15] T. Lindvall. ON A ROUTING PROBLEM , 2004, Probability in the Engineering and Informational Sciences.
[16] Jingyue Wu,et al. gpucc: An open-source GPGPU compiler , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[17] M. Pharr,et al. ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).
[18] Roman Novak,et al. Loop Optimization for Divergence Reduction on GPUs with SIMT Architecture , 2015, IEEE Transactions on Parallel and Distributed Systems.
[19] Joe D. Warren,et al. The program dependence graph and its use in optimization , 1987, TOPL.
[20] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[21] Luc Bougé,et al. Control structures for data-parallel SIMD languages: semantics and implementation , 1992, Future Gener. Comput. Syst..
[22] A. Grimshaw,et al. High Performance and Scalable Radix Sorting: a Case Study of Implementing Dynamic Parallelism for GPU Computing , 2011, Parallel Process. Lett..
[23] Michela Taufer,et al. Performance impact of dynamic parallelism on different clustering algorithms , 2013, Defense, Security, and Sensing.
[24] Peng Tu,et al. Writing scalable SIMD programs with ISPC , 2014, WPMVP '14.
[25] Michael Garland,et al. Understanding throughput-oriented architectures , 2010, Commun. ACM.
[26] Fernando Magno Quintão Pereira,et al. Profiling divergences in GPU applications , 2013, Concurr. Comput. Pract. Exp..
[27] Benedict R. Gaster. An Execution Model for OpenCL 2.0 , 2014 .
[28] Timothy G. Mattson,et al. OpenCL Programming Guide , 2011 .
[29] Dorota H. Kieronska,et al. Formal Specification of Parallel SIMD Execution , 1996, Theor. Comput. Sci..
[30] Fernando Magno Quintão Pereira,et al. Divergence Analysis and Optimizations , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[31] Tarek S. Abdelrahman,et al. Reducing divergence in GPGPU programs with loop merging , 2013, GPGPU@ASPLOS.
[32] Fernando Magno Quintão Pereira,et al. Divergence Analysis with Affine Constraints , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.
[33] Jie Cheng,et al. CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..
[34] Nicolas Pinto,et al. PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation , 2009, Parallel Comput..
[35] Ronan Keryell,et al. POMP or How to Design a Massively Parallel Machine with Small Developments , 1991, PARLE.
[36] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.