Streaming message interface: high-performance distributed memory programming on reconfigurable hardware
暂无分享,去创建一个
[1] Ralph Wittig,et al. MPI as a Programming Model for High-Performance Reconfigurable Computers , 2010, TRETS.
[2] Gustavo Alonso,et al. FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[3] Guo Chen,et al. Direct Universal Access: Making Data Center Resources Available to FPGA , 2019, NSDI.
[4] Chen Yang,et al. Novo-G#: Large-scale reconfigurable computing with direct and programmable interconnects , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).
[5] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[6] Satoshi Matsuoka,et al. Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL , 2018, FPGA.
[7] Taisuke Boku,et al. OpenCL-ready High Speed FPGA Network for Reconfigurable High Performance Computing , 2018, HPC Asia.
[8] Torsten Hoefler,et al. Transformations of High-Level Synthesis Codes for High-Performance Computing , 2018, IEEE Transactions on Parallel and Distributed Systems.
[9] Gustavo Alonso,et al. Application Partitioning on FPGA Clusters: Inference over Decision Tree Ensembles , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).
[10] Torsten Hoefler,et al. FBLAS: Streaming Linear Algebra on FPGA , 2019, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] Jason Cong,et al. High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[12] Satoru Yamamoto,et al. Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth , 2014, IEEE Transactions on Parallel and Distributed Systems.
[13] Robert G. Dimond,et al. Accelerating Large-Scale HPC Applications Using FPGAs , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.
[14] Jason Cong,et al. Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster , 2016, ISLPED.
[15] John Freeman,et al. From opencl to high-performance hardware on FPGAS , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).
[16] Alan D. George,et al. An OpenCL Framework for Distributed Apps on a Multidimensional Network of FPGAs , 2016, 2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3).
[17] Wu-chun Feng,et al. MPI-ACC: Accelerator-Aware MPI for Scientific Applications , 2016, IEEE Transactions on Parallel and Distributed Systems.
[18] Philip Heng Wai Leong,et al. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.
[19] Chen Yang,et al. FPDeep: Acceleration and Load Balancing of CNN Training on FPGA Clusters , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[20] Naif Tarafdar,et al. A Modular Heterogeneous Stack for Deploying FPGAs and CPUs in the Data Center , 2019, FPGA.
[21] J. Demmel,et al. Sun Microsystems , 1996 .
[22] Torsten Hoefler,et al. Deadlock-Free Oblivious Routing for Arbitrary Topologies , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[23] Jungwon Kim,et al. IMPACC: A Tightly Integrated MPI+OpenACC Framework Exploiting Shared Memory Parallelism , 2016, HPDC.
[24] Hari Angepat,et al. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave , 2018, IEEE Micro.
[25] Torsten Hoefler,et al. dCUDA: Hardware Supported Overlap of Computation and Communication , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.