NARMADA: Near-Memory Horizontal Diffusion Accelerator for Scalable Stencil Computations
暂无分享,去创建一个
Sander Stuijk | Henk Corporaal | Christoph Hagleitner | Gagandeep Singh | Dionysios Diamantopoulos | C. Hagleitner | H. Corporaal | S. Stuijk | Gagandeep Singh | D. Diamantopoulos
[1] Sander Stuijk,et al. NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[2] Masanori Hariyama,et al. OpenCL-Based FPGA-Platform for Stencil Computation and Its Optimization Methodology , 2017, IEEE Transactions on Parallel and Distributed Systems.
[3] G. Doms,et al. The Nonhydrostatic Limited-Area Model LM (Lokal-Modell) of DWD: Part I: Scientific Documentation (Ve , 1999 .
[4] Philip Brisk,et al. HLSPredict: Cross Platform Performance Prediction for FPGA High-Level Synthesis , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[5] Heiner Giefers,et al. Accelerating arithmetic kernels with coherent attached FPGA coprocessors , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[6] Torsten Hoefler,et al. MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures , 2015, ICS.
[7] W. Collins,et al. Description of the NCAR Community Atmosphere Model (CAM 3.0) , 2004 .
[8] Jason Cong,et al. SODA: Stencil with Optimized Dataflow Architecture , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[9] Kiyoung Choi,et al. ExtraV: Boosting Graph Processing Near Storage with a Coherent Accelerator , 2017, Proc. VLDB Endow..
[10] Torsten Hoefler,et al. Designing scalable FPGA architectures using high-level synthesis , 2018, PPoPP.
[11] Scott Kehler,et al. High Resolution Deterministic Prediction System (HRDPS) Simulations of Manitoba Lake Breezes , 2016 .
[12] Satoru Yamamoto,et al. Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth , 2014, IEEE Transactions on Parallel and Distributed Systems.
[13] Mohamed Wahib,et al. Scalable Kernel Fusion for Memory-Bound GPU Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Jeffrey Stuecheli,et al. CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..
[15] Heiner Giefers,et al. ecTALK: Energy efficient coherent transprecision accelerators — The bidirectional long short-term memory neural network case , 2018, 2018 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS).
[16] Christoph Hagleitner,et al. A System-Level Transprecision FPGA Accelerator for BLSTM Using On-chip Memory Reshaping , 2018, 2018 International Conference on Field-Programmable Technology (FPT).
[17] Jun A. Zhang,et al. Evaluating the Impact of Improvement in the Horizontal Diffusion Parameterization on Hurricane Prediction in the Operational Hurricane Weather Research and Forecast (HWRF) Model , 2018 .
[18] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[19] Guangwen Yang,et al. Performance Tuning and Analysis for Stencil-Based Applications on POWER8 Processor , 2019, ACM Trans. Archit. Code Optim..
[20] Daniel Sánchez,et al. Jenga: Software-defined cache hierarchies , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[21] Robert Schmid,et al. Getting Started with CAPI SNAP: Hardware Development for Software Engineers , 2018, Euro-Par Workshops.
[22] Sander Stuijk,et al. A Review of Near-Memory Computing Architectures: Opportunities and Challenges , 2018, 2018 21st Euromicro Conference on Digital System Design (DSD).