Extending OpenACC for Efficient Stencil Code Generation and Execution by Skeleton Frameworks
暂无分享,去创建一个
Mario A. R. Dantas | Márcio Bastos Castro | Luís Fabrício Wanderley Góes | Alyson D. Pereira | Rodrigo C. O. Rocha | M. Dantas | M. Castro | L. F. Góes
[1] Christoph W. Kessler,et al. SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.
[2] Alyson D. Pereira,et al. TOAST: Automatic tiling for iterative stencil computations on GPUs , 2017, Concurr. Comput. Pract. Exp..
[3] Maury M. Gouvea,et al. Cloud dynamics simulation with cellular automata , 2010, SummerSim.
[4] Monica S. Lam,et al. Efficient and exact data dependence analysis , 1991, PLDI '91.
[5] Vadim Maslov,et al. Delinearization: an efficient way to break multiloop dependence equations , 1992, PLDI '92.
[6] Albert Cohen,et al. Induction Variable Analysis with Delayed Abstractions , 2005, HiPEAC.
[7] Murray Cole,et al. PARTANS: An autotuning framework for stencil computation on multi-GPU systems , 2013, TACO.
[8] Michel Steuwer,et al. LIFT: A functional data-parallel IR for high-performance GPU code generation , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[9] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[10] Sergei Gorlatch,et al. SkelCL - A Portable Skeleton Library for High-Level GPU Programming , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[11] Matthias Christen,et al. Patus for convenient high-performance stencils: Evaluation in earthquake simulations , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Christian Terboven,et al. OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.
[13] J. Ramanujam,et al. On Recovering Multi-Dimensional Arrays in Polly , 2015 .
[14] Amirali Baniasadi,et al. Employing Software-Managed Caches in OpenACC , 2016, ACM Trans. Model. Perform. Evaluation Comput. Syst..
[15] Michael Wolfe,et al. Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form , 1995, TOPL.
[16] Michael McCool,et al. Structured parallel programming with deterministic patterns , 2010 .
[17] Luís Fabrício Wanderley Góes,et al. PSkel: A stencil programming framework for CPU‐GPU systems , 2015, Concurr. Comput. Pract. Exp..
[18] Kevin Skadron,et al. A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations , 2011, International Journal of Parallel Programming.
[19] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[20] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .
[21] Andreas Simbürger,et al. On the Variety of Static Control Parts in Real-World Programs : from Affine via Multi-dimensional to Polynomial and Just-inTime , 2013 .
[22] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.
[23] Richard Veras,et al. A stencil compiler for short-vector SIMD architectures , 2013, ICS '13.
[24] Christoph W. Kessler,et al. SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems , 2018, International Journal of Parallel Programming.
[25] Master Gardener,et al. Mathematical games: the fantastic combinations of john conway's new solitaire game "life , 1970 .
[26] Satoshi Matsuoka,et al. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).