OpenCL-Based FPGA-Platform for Stencil Computation and Its Optimization Methodology
暂无分享,去创建一个
Masanori Hariyama | Shunsuke Tatsumi | Hasitha Muthumala Waidyasooriya | Yasuhiro Takei | H. M. Waidyasooriya | M. Hariyama | Y. Takei | Shunsuke Tatsumi | Yasuhiro Takei
[1] G. Karniadakis,et al. Spectral/hp Element Methods for Computational Fluid Dynamics , 2005 .
[2] Jia Guo,et al. Writing productive stencil codes with overlapped tiling , 2009 .
[3] Satoru Yamamoto,et al. Domain-Specific Language and Compiler for Stencil Computation on FPGA-Based Systolic Computational-Memory Array , 2012, ARC.
[4] David Atienza,et al. A high-level synthesis flow for the implementation of iterative stencil loop algorithms on FPGA devices , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[5] K. Yee. Numerical solution of initial boundary value problems involving maxwell's equations in isotropic media , 1966 .
[6] Naoya Maruyama,et al. Optimizing Stencil Computations for NVIDIA Kepler GPUs , 2014 .
[7] Hans-Peter Seidel,et al. Cache Accurate Time Skewing in Iterative Stencil Computations , 2011, 2011 International Conference on Parallel Processing.
[8] Yu Cao,et al. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.
[9] José M. García,et al. CUDA 2D Stencil Computations for the Jacobi Method , 2010, PARA.
[10] Gerhard Wellein,et al. Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory , 2009, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[11] Achieving One TeraFLOPS with 28-nm FPGAs , 2010 .
[12] Eduard Ayguadé,et al. Exploiting memory customization in FPGA for 3D stencil computations , 2009, 2009 International Conference on Field-Programmable Technology.
[13] Tomofumi Yuki,et al. Towards Scalable and Efficient FPGA Stencil Accelerators , 2016, HiPEAC 2016.
[14] David G. Wonnacott,et al. Achieving Scalable Locality with Time Skewing , 2002, International Journal of Parallel Programming.
[15] David E. Keyes,et al. Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates , 2014, SIAM J. Sci. Comput..
[16] Tomofumi Yuki,et al. Towards Scalable and Efficient FPGA Stencil Accelerators Work-In-Progress , 2016 .
[17] Apan Qasem,et al. Understanding stencil code performance on multicore architectures , 2011, CF '11.
[18] Gerhard Wellein,et al. Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.
[19] Satoru Yamamoto,et al. Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth , 2014, IEEE Transactions on Parallel and Distributed Systems.
[20] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[21] Uday Bondhugula,et al. Tiling stencil computations to maximize parallelism , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] Albert Cohen,et al. Split tiling for GPUs: automatic parallelization using trapezoidal tiles , 2013, GPGPU@ASPLOS.
[23] J. Xu. OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .
[24] Masanori Hariyama,et al. FPGA-based deep-pipelined architecture for FDTD acceleration using OpenCL , 2016, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS).
[25] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.
[26] Yuichiro Shibata,et al. Performance Modeling of Stencil Computing on a Stream-Based FPGA Accelerator for Efficient Design Space Exploration , 2015, IEICE Trans. Inf. Syst..
[27] Masanori Hariyama,et al. OpenCL-Based Design of an FPGA Accelerator for Phase-Based Correspondence Matching , 2015 .
[28] Charles L. Byrne,et al. Applied Iterative Methods , 2007 .
[29] Yuichiro Shibata,et al. Power Performance Profiling of 3-D Stencil Computation on an FPGA Accelerator for Efficient Pipeline Optimization , 2016, CARN.
[30] G. Roth,et al. Compiling Stencils in High Performance Fortran , 1997, ACM/IEEE SC 1997 Conference (SC'97).
[31] Kevin Skadron,et al. A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations , 2011, International Journal of Parallel Programming.
[32] John Freeman,et al. OpenCL for FPGAs: Prototyping a Compiler , 2013 .