Exploring the Efficiency of OpenCL Pipe for Hiding Memory Latency on Cloud FPGAs
暂无分享,去创建一个
Hamed Tabkhi | Arnab A Purkayastha | Sai Raghavendran | Jhanani Thiagarajan | Arnab A. Purkayastha | H. Tabkhi | S. Raghavendran | Jhanani Thiagarajan
[1] Peng Zhang,et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[2] Pingfan Meng,et al. Real-time 3D reconstruction for FPGAs: A case study for evaluating the performance, area, and programmability trade-offs of the Altera OpenCL SDK , 2014, 2014 International Conference on Field-Programmable Technology (FPT).
[3] Bingsheng He,et al. Performance Modeling and Directives Optimization for High-Level Synthesis on FPGA , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[4] Jason Cong,et al. Bandwidth optimization through on-chip memory restructuring for HLS , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[5] Santosh G. Abraham,et al. Effective stream-based and execution-based data prefetching , 2004, ICS '04.
[6] Michael J. Flynn,et al. Hardware and software cache prefetching techniques for MPEG benchmarks , 2000, IEEE Trans. Circuits Syst. Video Technol..
[7] David J. Lilja,et al. Data prefetch mechanisms , 2000, CSUR.
[8] David R. Kaeli,et al. Exploring the Efficiency of the OpenCL Pipe Semantic on an FPGA , 2016, SIGARCH Comput. Archit. News.
[9] Hamed Tabkhi,et al. Taxonomy of Spatial Parallelism on FPGAs for Massively Parallel Applications , 2018, 2018 31st IEEE International System-on-Chip Conference (SOCC).
[10] Yun Liang,et al. Lin-Analyzer: A high-level performance analysis tool for FPGA-based accelerators , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[11] Doris Chen,et al. Fractal video compression in OpenCL: An evaluation of CPUs, GPUs, and FPGAs as acceleration platforms , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).
[12] Jason Cong,et al. Understanding Performance Differences of FPGAs and GPUs: (Abtract Only) , 2018, FPGA.
[13] Jing Li,et al. Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.
[14] Kenta Kasai,et al. Flexible non-binary LDPC decoding on FPGAs , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] John Wawrzynek,et al. Architectural synthesis of computational pipelines with decoupled memory access , 2014, 2014 International Conference on Field-Programmable Technology (FPT).
[16] Henry M. Levy,et al. An Architecture for Software-Controlled Data Prefetching , 1991, ISCA.
[17] Seth H. Pugsley,et al. Efficiently prefetching complex address patterns , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18] Wei Zhang,et al. A performance analysis framework for optimizing OpenCL applications on FPGAs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[19] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[20] Collin McCurdy,et al. Diagnosis and optimization of application prefetching performance , 2013, ICS '13.
[21] Hamed Tabkhi,et al. Locality Aware Memory Assignment and Tiling , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[22] Alexander V. Veidenbaum,et al. Multiple stream tracker: a new hardware stride prefetcher , 2014, Conf. Computing Frontiers.
[23] Wei Zhang,et al. FlexCL: An analytical performance model for OpenCL workloads on flexible FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[24] Satoshi Matsuoka,et al. Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[25] David Bernstein,et al. Compiler techniques for data prefetching on the PowerPC , 1995, PACT.
[26] Shankar Balachandran,et al. Hardware prefetchers for emerging parallel applications , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[27] Martin Margala,et al. High level programming of FPGAs for HPC and data centric applications , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).
[28] Sean O. Settle. High-performance Dynamic Programming on FPGAs with OpenCL , 2013 .
[29] Peng Zhang. Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[30] Darren J. Kerbyson,et al. Analysis of double buffering on two different multicore architectures: Quad-core Opteron and the Cell-BE , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[31] Andrew C. Ling,et al. An OpenCL(TM) Deep Learning Accelerator on Arria 10 , 2017 .