Pointer-Based Divergence Analysis for OpenCL 2.0 Programs
暂无分享,去创建一个
[1] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.
[2] Sudhakar Yalamanchili,et al. Dynamic compilation of data-parallel kernels for vector processors , 2012, CGO '12.
[3] Arthur B. Maccabe,et al. The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages , 1990, PLDI '90.
[4] David A. Padua,et al. Efficient building and placing of gating functions , 1995, PLDI '95.
[5] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[6] Shorin Kyo,et al. A dynamic SIMD/MIMD mode switching processor for embedded real-time image recognition systems , 2011, IEEE Asian Solid-State Circuits Conference 2011.
[7] Sebastian Hack,et al. Improving Performance of OpenCL on CPUs , 2012, CC.
[8] Sudhakar Yalamanchili,et al. Characterization and transformation of unstructured control flow in bulk synchronous GPU applications , 2012, Int. J. High Perform. Comput. Appl..
[9] Fernando Magno Quintão Pereira,et al. Divergence Analysis and Optimizations , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[10] Tarek S. Abdelrahman,et al. Reducing divergence in GPGPU programs with loop merging , 2013, GPGPU@ASPLOS.
[11] Sebastian Hack,et al. Partial control-flow linearization , 2018, PLDI.
[12] Tianyi David Han,et al. Reducing branch divergence in GPU programs , 2011, GPGPU-4.
[13] M. Wegman,et al. Global value numbers and redundant computations , 1988, POPL '88.
[14] R. Govindarajan,et al. Taming Control Divergence in GPUs through Control Flow Linearization , 2014, CC.
[15] Xipeng Shen,et al. Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping , 2010, ICS '10.
[16] Sylvain Collange,et al. Fusion of Calling Sites , 2015, 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).
[17] Raymond Lo,et al. Effective Representation of Aliases and Indirect Memory Operations in SSA Form , 1996, CC.
[18] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[19] Jenq Kuen Lee,et al. Support of Probabilistic Pointer Analysis in the SSA Form , 2012, IEEE Transactions on Parallel and Distributed Systems.
[20] Tor M. Aamodt,et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[21] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[22] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[23] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[24] Sebastian Hack,et al. Whole-function vectorization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[25] Shao-Chung Wang,et al. Architecture and Compiler Support for GPUs Using Energy-Efficient Affine Register Files , 2017, TODE.
[26] Krste Asanovic,et al. Convergence and scalarization for data-parallel architectures , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[27] Fernando Magno Quintão Pereira,et al. Divergence Analysis with Affine Constraints , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.
[28] Magnus Jahre,et al. Efficient control flow restructuring for GPUs , 2016, 2016 International Conference on High Performance Computing & Simulation (HPCS).
[29] Fernando Magno Quintão Pereira,et al. Spill Code Placement for SIMD Machines , 2012, SBLP.
[30] Tor M. Aamodt,et al. Thread block compaction for efficient SIMT control flow , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[31] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[32] Christian Terboven,et al. OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.
[33] Kevin Skadron,et al. Dynamic warp subdivision for integrated branch and memory divergence tolerance , 2010, ISCA.
[34] Jong-Deok Choi,et al. Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects , 1993, POPL '93.
[35] Xiaoming Li,et al. A control-structure splitting optimization for GPGPU , 2009, CF '09.
[36] Mike Murphy,et al. Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs , 2010, CGO '10.
[37] Mark N. Wegman,et al. Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.