Optimizing Parallel Reduction on OpenCL FPGA Platform – A Case Study of Frequent Pattern Compression
暂无分享,去创建一个
[1] Wei Zhang,et al. A study of data partitioning on OpenCL-based FPGAs , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).
[2] David A. Wood,et al. Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches , 2004 .
[3] George A. Constantinides,et al. A Case for Work-stealing on FPGAs with OpenCL Atomics , 2016, FPGA.
[4] Wenguang Chen,et al. MapCG: Writing parallel program portable between CPU and GPU , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[5] John Freeman,et al. From opencl to high-performance hardware on FPGAS , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).
[6] Maged M. Michael,et al. High performance dynamic lock-free hash tables and list-based sets , 2002, SPAA '02.
[7] Eddy Z. Zhang,et al. Massive atomics for massive parallelism on GPUs , 2014, ISMM '14.
[8] Christos-Savvas Bouganis,et al. GPU Versus FPGA for High Productivity Computing , 2010, 2010 International Conference on Field Programmable Logic and Applications.
[9] Roberto Torres,et al. Algorithmic strategies for optimizing the parallel reduction primitive in CUDA , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).
[10] Wu-chun Feng,et al. Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[11] Dirk Koch,et al. FPGAs for Software Programmers , 2016 .
[12] Wu-chun Feng,et al. Performance Characterization and Optimization of Atomic Operations on AMD GPUs , 2011, 2011 IEEE International Conference on Cluster Computing.
[13] Viktor K. Prasanna,et al. Designing scalable FPGA-based reduction circuits using pipelined floating-point cores , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[14] Onur Mutlu,et al. Base-Delta-Immediate Compression: A Practical Data Compression Mechanism for On-Chip Caches , 2012 .
[15] Yu Ting Chen,et al. A Survey and Evaluation of FPGA High-Level Synthesis Tools , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[16] Vincent Gramoli,et al. More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms , 2015, PPoPP.