A Domain-Specific Language and Compiler for Stencil Computations on Short-Vector SIMD and GPU Architectures
暂无分享,去创建一个
Richard Veras | Franz Franchetti | J. Ramanujam | P. Sadayappan | Atanas Rountev | Louis-Noël Pouchet | Thomas Henretty | Justin Holewinski | F. Franchetti | R. Veras | J. Ramanujam | A. Rountev | P. Sadayappan | L. Pouchet | Thomas Henretty | Justin Holewinski
[1] Kevin Skadron,et al. A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations , 2011, International Journal of Parallel Programming.
[2] Jason Cong,et al. Accelerating Fluid Registration Algorithm on Multi-FPGA Platforms , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.
[3] Lei Huang,et al. PADS: A Pattern-Driven Stencil Compiler-Based Tool for Reuse of Optimizations on GPGPUs , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.
[4] Franz Franchetti,et al. Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.
[5] Ravindra K. Ahuja,et al. Network Flows: Theory, Algorithms, and Applications , 1993 .
[6] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[7] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .
[8] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.
[9] A. Taflove. The Finite-Difference Time-Domain Method , 1995 .
[10] Jason Cong,et al. Lithographic aerial image simulation with FPGA-based hardwareacceleration , 2008, FPGA '08.
[11] G. Smith,et al. Numerical Solution of Partial Differential Equations: Finite Difference Methods , 1978 .
[12] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[13] Gerhard Wellein,et al. Efficient multicore-aware parallelization strategies for iterative stencil computations , 2010, J. Comput. Sci..
[14] Albert Cohen,et al. Split tiling for GPUs: automatic parallelization using trapezoidal tiles , 2013, GPGPU@ASPLOS.
[15] Helmar Burkhart,et al. PATUS: A Code Generation and Auto-Tuning Framework For Parallel Stencil Computations , 2011 .
[16] G. Dantzig,et al. FINDING A CYCLE IN A GRAPH WITH MINIMUM COST TO TIME RATIO WITH APPLICATION TO A SHIP ROUTING PROBLEM , 1966 .
[17] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[18] W. Marsden. I and J , 2012 .
[19] Ronald L. Rivest,et al. Introduction to Algorithms, Second Edition , 2001 .
[20] Hans-Peter Seidel,et al. Cache oblivious parallelograms in iterative stencil computations , 2010, ICS '10.
[21] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] Paulius Micikevicius,et al. 3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.
[23] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .
[24] Uday Bondhugula,et al. Tiling stencil computations to maximize parallelism , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[25] Richard Veras,et al. A stencil compiler for short-vector SIMD architectures , 2013, ICS '13.
[26] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[27] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[28] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.