Demystifying the Memory System of Modern Datacenter FPGAs for Software Programmers through Microbenchmarking
暂无分享,去创建一个
[1] Cody Hao Yu,et al. Best-Effort FPGA Programming: A Few Steps Can Go a Long Way , 2018, ArXiv.
[2] Jason Cong,et al. Understanding Performance Differences of FPGAs and GPUs , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[3] Yaxin Bi,et al. KNN Model-Based Approach in Classification , 2003, OTM.
[4] Zhenman Fang,et al. Measuring Microarchitectural Details of Multi- and Many-Core Memory Systems through Microbenchmarking , 2015, ACM Trans. Archit. Code Optim..
[5] Srinivasan Parthasarathy,et al. Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] John R. Rice,et al. An Interactive Symbolic-Numeric Interface to Parallel ELLPACK for Building General PDE Solvers , 1990 .
[7] Jeffrey Stuecheli,et al. CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..
[8] Hans-Peter Kriegel,et al. Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.
[9] Jason Cong,et al. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[10] Gu-Yeon Wei,et al. MachSuite: Benchmarks for accelerator design and customized architectures , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).
[11] Jason Cong,et al. Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[12] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[13] Jie Zhang,et al. Shuhai: Benchmarking High Bandwidth Memory On FPGAS , 2020, 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[14] Feifei Li,et al. K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).
[15] N. Altman. An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .
[16] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.
[17] Jason Cong,et al. In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms , 2019, ACM Trans. Reconfigurable Technol. Syst..
[18] Onur Mutlu,et al. Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization , 2016, SIGMETRICS.
[19] Eriko Nurvitadhi,et al. A sparse matrix vector multiply accelerator for support vector machine , 2015, 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).
[20] Jason Cong,et al. HBM Connect: High-Performance HLS Interconnect for FPGA HBM , 2021, FPGA.
[21] Zhenman Fang,et al. CHIP-KNN: A Configurable and High-Performance K-Nearest Neighbors Accelerator on Cloud FPGAs , 2020, 2020 International Conference on Field-Programmable Technology (ICFPT).
[22] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[23] Jason Cong,et al. Bandwidth optimization through on-chip memory restructuring for HLS , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[24] Eric S. Chung,et al. A reconfigurable fabric for accelerating large-scale datacenter services , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).