Shuhai: Benchmarking High Bandwidth Memory On FPGAS
暂无分享,去创建一个
Jie Zhang | Zeke Wang | Gustavo Alonso | Hongjing Huang | G. Alonso | Zeke Wang | Hongjing Huang | Jie Zhang
[1] Karin Strauss,et al. A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.
[2] Wei Zhang,et al. Melia: A MapReduce Framework on OpenCL-Based FPGAs , 2016, IEEE Transactions on Parallel and Distributed Systems.
[3] James Demmel,et al. Scaling Deep Learning on GPU and Knights Landing clusters , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] Hao Wang,et al. Exploring and Analyzing the Real Impact of Modern On-Package Memory on HPC Scientific Kernels , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Christos-Savvas Bouganis,et al. fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[6] Martin L. Kersten,et al. Generic Database Cost Models for Hierarchical Memory Systems , 2002, VLDB.
[7] Pingfan Meng,et al. Spector: An OpenCL FPGA benchmark suite , 2016, 2016 International Conference on Field-Programmable Technology (FPT).
[8] Jason Cong,et al. In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms , 2019, ACM Trans. Reconfigurable Technol. Syst..
[9] Gustavo Alonso,et al. BiS-KM: Enabling Any-Precision K-Means on FPGAs , 2020, FPGA.
[10] Bingsheng He,et al. Deploying Hash Tables on Die-Stacked High Bandwidth Memory , 2019, CIKM.
[11] Qiuwen Lou,et al. Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).
[12] Paolo Ienne,et al. Efficient synthesis of compressor trees on FPGAs , 2008, 2008 Asia and South Pacific Design Automation Conference.
[13] George A. Constantinides,et al. A Case for Work-stealing on FPGAs with OpenCL Atomics , 2016, FPGA.
[14] Avinash Sodani,et al. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition , 2016 .
[15] S. Hauck,et al. A Model for Programming Data-Intensive Applications on FPGAs: A Genomics Case Study , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.
[16] Tom Drummond,et al. FPGA acceleration of multilevel ORB feature extraction for computer vision , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[17] Constantin Pohl,et al. Joins in a heterogeneous memory hierarchy: exploiting high-bandwidth memory , 2018, DaMoN.
[18] Bérenger Bramas,et al. Fast Sorting Algorithms using AVX-512 on Intel Knights Landing , 2017, ArXiv.
[19] Hongyu Miao,et al. StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory , 2019, ASPLOS.
[20] Wei Zhang,et al. A performance analysis framework for optimizing OpenCL applications on FPGAs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[21] Alexander V. Veidenbaum,et al. AFFIX: Automatic Acceleration Framework for FPGA Implementation of OpenVX Vision Algorithms , 2019, FPGA.
[22] Paolo Ienne,et al. Stop Crying Over Your Cache Miss Rate: Handling Efficiently Thousands of Outstanding Misses in FPGAs , 2019, FPGA.
[23] Gokcen Kestor,et al. Exploring the Performance Benefit of Hybrid Memory System on HPC Environments , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[24] William J. Dally,et al. Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[25] Christophe Bobda,et al. Transparent Acceleration of Image Processing Kernels on FPGA-Attached Hybrid Memory Cube Computers , 2018, 2018 International Conference on Field-Programmable Technology (FPT).
[26] Gustavo Alonso,et al. Runtime Parameterizable Regular Expression Operators for Databases , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[27] Ming Liu,et al. A transport-layer network for distributed FPGA platforms , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).
[28] Syed Waqar Nabi,et al. Smart-Cache: Optimising Memory Accesses for Arbitrary Boundaries and Stencils on FPGAs , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[29] Gustavo Alonso,et al. Lowering the Latency of Data Processing Pipelines Through FPGA based Hardware Acceleration , 2019, Proc. VLDB Endow..
[30] Wei Zhang,et al. A study of data partitioning on OpenCL-based FPGAs , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).
[31] Gustavo Alonso,et al. Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning , 2019, Proc. VLDB Endow..
[32] Wayne Luk,et al. A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation , 2009, FPGA '09.
[33] Dirk Koch,et al. Unexpected Diversity: Quantitative Memory Analysis for Zynq UltraScale+ Systems , 2019, 2019 International Conference on Field-Programmable Technology (ICFPT).
[34] Joungho Kim,et al. Design optimization of high bandwidth memory (HBM) interposer considering signal integrity , 2015, 2015 IEEE Electrical Design of Advanced Packaging and Systems Symposium (EDAPS).
[35] Jing Li,et al. Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform , 2018, FPGA.
[36] Keith Kim,et al. HBM (High Bandwidth Memory) DRAM Technology and Architecture , 2017, 2017 IEEE International Memory Workshop (IMW).
[37] Gustavo Alonso,et al. Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning , 2019, Proceedings of the VLDB Endowment.
[38] Joe Macri,et al. AMD's next generation GPU and high bandwidth memory architecture: FURY , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).
[39] Bingsheng He,et al. Multikernel Data Partitioning With Channel on OpenCL-Based FPGAs , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[40] Wei Zhang,et al. Relational query processing on OpenCL-based FPGAs , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[41] James C. Hoe,et al. A Study of Pointer-Chasing Performance on Shared-Memory Processor-FPGA Systems , 2016, FPGA.
[42] Martin C. Herbordt,et al. GhostSZ: A Transparent FPGA-Accelerated Lossy Compression Framework , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[43] Jason Cong,et al. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[44] Hamid Reza Zohouri,et al. The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface , 2019, 2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC).
[45] Yong Dou,et al. An FPGA-based processor for training convolutional neural networks , 2017, 2017 International Conference on Field Programmable Technology (ICFPT).
[46] Syed Waqar Nabi,et al. MP-STREAM: A Memory Performance Benchmark for Design Space Exploration on Heterogeneous HPC Devices , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).