High-Performance Spectral Element Methods on Field-Programmable Gate Arrays : Implementation, Evaluation, and Future Projection
暂无分享,去创建一个
Christian Plessl | Niclas Jansson | Stefano Markidis | Tobias Kenter | Artur Podobas | Philipp Schlatter | Martin Karp | S. Markidis | P. Schlatter | Christian Plessl | Artur Podobas | Niclas Jansson | Tobias Kenter | Martin Karp
[1] Marcel Gort,et al. From software to accelerators with LegUp high-level synthesis , 2013, 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).
[2] Ronan Keryell,et al. Optimizing OpenCL applications on Xilinx FPGA , 2016, IWOCL.
[3] Erwin Laure,et al. Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations , 2016, The Journal of Supercomputing.
[4] Satoshi Matsuoka,et al. Designing and accelerating spiking neural networks using OpenCL for FPGAs , 2017, 2017 International Conference on Field Programmable Technology (ICFPT).
[5] Qi Yu,et al. DLAU: A Scalable Deep Learning Accelerator Unit on FPGA , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[6] Jeffrey S. Vetter,et al. Architectures for the Post-Moore Era , 2017, IEEE Micro.
[7] John Freeman,et al. From opencl to high-performance hardware on FPGAS , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).
[8] Russell Tessier,et al. FPGA Architecture: Survey and Challenges , 2008, Found. Trends Electron. Des. Autom..
[9] Satoshi Matsuoka,et al. Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] Mitsuhisa Sato,et al. PEACH2: An FPGA-based PCIe network device for Tightly Coupled Accelerators , 2014, CARN.
[11] Haohuan Fu,et al. Accelerating 3D convolution using streaming architectures on FPGAs , 2009 .
[12] Satoru Yamamoto,et al. FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks , 2017, IEEE Transactions on Parallel and Distributed Systems.
[13] Philip Heng Wai Leong,et al. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.
[14] Michel Schanen,et al. On the Strong Scaling of the Spectral Element Solver Nek5000 on Petascale Systems , 2016, EASC.
[15] Niclas Jansson,et al. Optimization of Tensor-product Operations in Nekbone on GPUs , 2020, ArXiv.
[16] Georgi Gaydadjiev,et al. Maxeler Data-Flow in Computational Finance , 2015 .
[17] Timothy C. Warburton,et al. Acceleration of tensor-product operations for high-order finite element methods , 2017, Int. J. High Perform. Comput. Appl..
[18] Christian Plessl,et al. Evaluating FPGA Accelerator Performance with a Parameterized OpenCL Adaptation of the HPCChallenge Benchmark Suite , 2020, ArXiv.
[19] Christian Plessl,et al. OpenCL Implementation of Cannon’s Matrix Multiplication Algorithm on Intel Stratix 10 FPGAs , 2019, 2019 International Conference on Field-Programmable Technology (ICFPT).
[20] Kentaro Sano,et al. OpenMP Device Offloading to FPGAs Using the Nymble Infrastructure , 2020, IWOMP.
[21] Jungwon Kim,et al. OpenACC to FPGA: A Framework for Directive-Based High-Performance Reconfigurable Computing , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[22] Mats Brorsson,et al. Empowering OpenMP with automatically generated hardware , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).
[23] Martin C. Herbordt,et al. An OpenCL 3D FFT for Molecular Dynamics Simulations on Multiple FPGAs , 2020, ArXiv.
[24] Hamid Reza Zohouri,et al. The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface , 2019, 2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC).
[25] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.
[26] Satoshi Matsuoka,et al. From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore era , 2016, Conf. Computing Frontiers.
[27] K. Bernstein,et al. Scaling, power, and the future of CMOS , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..
[28] Samuel Williams,et al. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures , 2008 .
[29] Satoshi Matsuoka,et al. Evaluating high-level design strategies on FPGAs for high-performance computing , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[30] Jiayi Sheng,et al. Fully Integrated On-FPGA Molecular Dynamics Simulations , 2019, ArXiv.
[31] Péter Szolgay,et al. FPGA based acceleration of computational fluid flow simulation on unstructured mesh geometry , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).
[32] Hal Finkel,et al. Exploring the Random Network of Hodgkin and Huxley Neurons with Exponential Synaptic Conductances on OpenCL FPGA Platform , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[33] Christian Plessl,et al. Flexible FPGA design for FDTD using OpenCL , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[34] Yoshiki Yamaguchi,et al. FPGA-Based Computational Fluid Dynamics Simulation Architecture via High-Level Synthesis Design Method , 2020, ARC.
[35] Masanori Hariyama,et al. OpenCL-Based FPGA-Platform for Stencil Computation and Its Optimization Methodology , 2017, IEEE Transactions on Parallel and Distributed Systems.
[36] Satoshi Matsuoka,et al. Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL , 2018, FPGA.
[37] Chun Chen,et al. Speeding up Nek5000 with autotuning and specialization , 2010, ICS '10.
[38] Christian Plessl,et al. OpenCL-Based FPGA Design to Accelerate the Nodal Discontinuous Galerkin Method for Unstructured Meshes , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).