A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays: Design, Evaluation, and Future Challenges
暂无分享,去创建一个
Niclas Jansson | Stefano Markidis | Tobias Kenter | Artur Podobas | Philipp Schlatter | Martin Karp | Christian Plessl | S. Markidis | P. Schlatter | Artur Podobas | Niclas Jansson | Tobias Kenter | Christian Plessl | Martin Karp
[1] Christian Plessl,et al. OpenCL-Based FPGA Design to Accelerate the Nodal Discontinuous Galerkin Method for Unstructured Meshes , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[2] Torsten Hoefler,et al. Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication , 2019, SC.
[3] David A. Patterson,et al. Motivation for and Evaluation of the First Tensor Processing Unit , 2018, IEEE Micro.
[4] Péter Szolgay,et al. FPGA based acceleration of computational fluid flow simulation on unstructured mesh geometry , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).
[5] M. Mitchell Waldrop,et al. The chips are down for Moore’s law , 2016, Nature.
[6] Torsten Hoefler,et al. Transformations of High-Level Synthesis Codes for High-Performance Computing , 2018, IEEE Transactions on Parallel and Distributed Systems.
[7] Niclas Jansson,et al. Neko: A Modern, Portable, and Scalable Framework for High-Fidelity Computational Fluid Dynamics , 2021, Computers & Fluids.
[8] P. Fischer,et al. High-Order Methods for Incompressible Fluid Flow , 2002 .
[9] Catherine D. Schuman,et al. A Survey of Neuromorphic Computing and Neural Networks in Hardware , 2017, ArXiv.
[10] Carlos Carreras,et al. Memory optimization in FPGA-accelerated scientific codes based on unstructured meshes , 2014, J. Syst. Archit..
[11] Jeffrey S. Vetter,et al. Architectures for the Post-Moore Era , 2017, IEEE Micro.
[12] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[13] Kentaro Sano,et al. A Template-based Framework for Exploring Coarse-Grained Reconfigurable Architectures , 2020, 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[14] J. Ramanujam,et al. On characterizing the data movement complexity of computational DAGs for parallel execution , 2014, SPAA.
[15] Jason Helge Anderson,et al. LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.
[16] Fabrizio Ferrandi,et al. Bambu: A modular framework for the high level synthesis of memory-intensive applications , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.
[17] Elia Merzari,et al. NekRS, a GPU-Accelerated Spectral Element Navier-Stokes Solver , 2021, Parallel Comput..
[18] Torsten Hoefler,et al. Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs , 2021, SPAA.
[19] Satoshi Matsuoka,et al. Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL , 2018, FPGA.
[20] Paolo Ienne,et al. Using 3D integration technology to realize multi-context FPGAs , 2009, 2009 International Conference on Field Programmable Logic and Applications.
[21] Christian Plessl,et al. High-Performance Spectral Element Methods on Field-Programmable Gate Arrays : Implementation, Evaluation, and Future Projection , 2020, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[22] John Freeman,et al. From opencl to high-performance hardware on FPGAS , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).
[23] Guy Lemieux,et al. ZUMA: An Open FPGA Overlay Architecture , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.
[24] Jason Cong,et al. FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.
[25] Tobias Kenter,et al. Algorithm-hardware co-design of a discontinuous Galerkin shallow-water model for a dataflow architecture on FPGA , 2021, PASC.
[26] Greg Stitt,et al. FPGA Acceleration of Fluid-Flow Kernels , 2020, 2020 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC).
[27] Nikoli Dryden,et al. Data Movement Is All You Need: A Case Study on Optimizing Transformers , 2020, MLSys.
[28] Russell Tessier,et al. FPGA Architecture: Survey and Challenges , 2008, Found. Trends Electron. Des. Autom..
[29] Mats Brorsson,et al. Empowering OpenMP with automatically generated hardware , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).
[30] Hossein Omidian,et al. A Domain-Specific Architecture for Accelerating Sparse Matrix Vector Multiplication on FPGAs , 2020, 2020 30th International Conference on Field-Programmable Logic and Applications (FPL).
[31] Axel Jantsch,et al. A survey of memory architecture for 3D chip multi-processors , 2014, Microprocess. Microsystems.
[32] Laszlo Gyongyosi,et al. A Survey on quantum computing technology , 2019, Comput. Sci. Rev..
[33] Kentaro Sano. FPGA-Based Systolic Computational-Memory Array for Scalable Stencil Computations , 2013 .
[34] P. Briggs,et al. Rematerialization , 1992, PLDI.
[35] Torsten Hoefler,et al. Graph Processing on FPGAs: Taxonomy, Survey, Challenges , 2019, ArXiv.
[36] J. Ramanujam,et al. On Using the Roofline Model with Lower Bounds on Data Movement , 2015, ACM Trans. Archit. Code Optim..
[37] Nick Brown,et al. Exploring the acceleration of Nekbone on reconfigurable architectures , 2020, 2020 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC).
[38] Satoshi Matsuoka,et al. Designing and accelerating spiking neural networks using OpenCL for FPGAs , 2017, 2017 International Conference on Field Programmable Technology (ICFPT).
[39] Kentaro Sano,et al. A Survey on Coarse-Grained Reconfigurable Architectures From a Performance Perspective , 2020, IEEE Access.
[40] Phillip H. Jones,et al. Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels , 2019, 2019 IEEE International Conference on Embedded Software and Systems (ICESS).
[41] George Karypis,et al. Parmetis parallel graph partitioning and sparse matrix ordering library , 1997 .
[42] Georgi Gaydadjiev,et al. Maxeler Data-Flow in Computational Finance , 2015 .
[43] Christian Plessl,et al. Evaluating FPGA Accelerator Performance with a Parameterized OpenCL Adaptation of Selected Benchmarks of the HPCChallenge Benchmark Suite , 2020, 2020 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC).
[44] Hamid Reza Zohouri,et al. The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface , 2019, 2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC).
[45] Wayne Luk,et al. Optimising Sparse Matrix Vector multiplication for large scale FEM problems on FPGA , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[46] Richard Barrett,et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.
[47] William Gropp,et al. CFD Vision 2030 Study: A Path to Revolutionary Computational Aerosciences , 2014 .
[48] C. W. Glass,et al. Performance Modeling of the HPCG Benchmark , 2014, PMBS@SC.