Memory-access-aware Safety and Profitability Analysis for Transformation of Accelerator-bound OpenMP Loops
暂无分享,去创建一个
[1] Sven-Bodo Scholz,et al. Unibench: A Tool for Automated and Collaborative Benchmarking , 2010, 2010 IEEE 18th International Conference on Program Comprehension.
[2] Sunita Chandrasekaran,et al. SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance , 2014, PMBS@SC.
[3] Michael F. P. O'Boyle,et al. Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[4] José Nelson Amaral,et al. Automated GPU Grid Geometry Selection for OPENMP Kernels , 2018, 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).
[5] Utpal Banerjee,et al. Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.
[6] Kyle A. Gallivan,et al. A unified framework for nonlinear dependence testing and symbolic analysis , 2004, ICS '04.
[7] Ken Kennedy,et al. Practical dependence testing , 1991, PLDI '91.
[8] Lawrence Rauchwerger,et al. Logical inference techniques for loop parallelization , 2012, PLDI.
[9] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[10] Michael Wolfe,et al. Optimizing supercompilers for supercomputers , 1989, ICS.
[11] Patrick Cousot,et al. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.
[12] Constantine D. Polychronopoulos,et al. Symbolic Program Analysis and Optimization for Parallelizing Compilers , 1992, LCPC.
[13] Graham D. Riley,et al. Formalizing OpenMP Performance Properties with ASL , 2000, ISHPC.
[14] Rudolf Eigenmann,et al. The range test: a dependence test for symbolic, non-linear expressions , 1994, Proceedings of Supercomputing '94.
[15] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[16] Xuhao Chen,et al. Performance model for OpenMP parallelized loops , 2011, Proceedings 2011 International Conference on Transportation, Mechanical, and Electrical Engineering (TMEE).
[17] Sungdo Moon,et al. Predicated array data-flow analysis for run-time parallelization , 1998, ICS '98.
[18] Michael F. P. O'Boyle,et al. Mapping parallelism to multi-cores: a machine learning based approach , 2009, PPoPP '09.
[19] Scott A. Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.
[20] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[21] Kevin Skadron,et al. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs , 2009, ICS.
[22] Lawrence Rauchwerger,et al. Scalable conditional induction variables (CIV) analysis , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[23] Leslie Lamport,et al. The parallel execution of DO loops , 1974, CACM.
[24] Arthur Stoutchinin,et al. Efficient static single assignment form for predication , 2001, MICRO.
[25] D. Zhang,et al. The value evolution graph and its use in memory reference analysis , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[26] David Norton,et al. Performance Portability and OpenACC , 2014 .
[27] Michael Wolfe,et al. Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form , 1995, TOPL.
[28] Thomas E. Cheatham,et al. Symbolic evaluation of programs: a look at loop analysis , 1976, SYMSAC '76.
[29] Yunheung Paek,et al. Efficient and precise array access analysis , 2002, TOPL.