Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL
暂无分享,去创建一个
[1] Mike Hutton. Stratix® 10: 14nm FPGA delivering 1GHz , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).
[2] Naoyuki Onodera,et al. High-Productivity Framework on GPU-Rich Supercomputers for Operational Weather Prediction Code ASUCA , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Satoru Yamamoto,et al. FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks , 2017, IEEE Transactions on Parallel and Distributed Systems.
[4] Naoya Maruyama,et al. Optimizing Stencil Computations for NVIDIA Kepler GPUs , 2014 .
[5] John Freeman,et al. From opencl to high-performance hardware on FPGAS , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).
[6] Marco D. Santambrogio,et al. On How to Accelerate Iterative Stencil Loops , 2015, ACM Trans. Archit. Code Optim..
[7] Jun Zhou,et al. Physics-based seismic hazard analysis on petascale heterogeneous supercomputers , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[8] J. Ramanujam,et al. SDSLc: a multi-target domain-specific compiler for stencil computations , 2015, WOLFHPC@SC.
[9] Tomofumi Yuki,et al. One size does not fit all: Implementation trade-offs for iterative stencil computations on FPGAs , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[10] Eriko Nurvitadhi,et al. Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? , 2017, FPGA.
[11] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Yun Liang,et al. A comprehensive framework for synthesizing stencil algorithms on FPGAs using OpenCL model , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[13] Chao Yang,et al. Ultra-Scalable CPU-MIC Acceleration of Mesoscale Atmospheric Modeling on Tianhe-2 , 2015, IEEE Transactions on Computers.
[14] Satoshi Matsuoka,et al. Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[15] Frank O. Bryan,et al. Impact of ocean model resolution on CCSM climate simulations , 2012, Climate Dynamics.
[16] Christian Plessl,et al. Flexible FPGA design for FDTD using OpenCL , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[17] R. Neale,et al. Improvements in a half degree atmosphere/land version of the CCSM , 2010 .
[18] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[19] Masanori Hariyama,et al. OpenCL-Based FPGA-Platform for Stencil Computation and Its Optimization Methodology , 2017, IEEE Transactions on Parallel and Distributed Systems.