FlexCL: An analytical performance model for OpenCL workloads on flexible FPGAs
暂无分享,去创建一个
[1] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[2] Yun Liang,et al. High-Level Synthesis: Productivity, Performance, and Software Constraints , 2012, J. Electr. Comput. Eng..
[3] Yun Liang,et al. Lin-Analyzer: A high-level performance analysis tool for FPGA-based accelerators , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[4] Jason Helge Anderson,et al. LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.
[5] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.
[6] Jason Cong,et al. High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[7] Kazutoshi Wakabayashi,et al. Divide and conquer high-level synthesis design space exploration , 2012, TODE.
[8] Wei Zhang,et al. A performance analysis framework for optimizing OpenCL applications on FPGAs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[9] Yun Liang,et al. A comprehensive framework for synthesizing stencil algorithms on FPGAs using OpenCL model , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[10] Jason Cong,et al. An efficient and versatile scheduling algorithm based on SDC formulation , 2006, 2006 43rd ACM/IEEE Design Automation Conference.
[11] B. Ramakrishna Rau,et al. Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.
[12] Gu-Yeon Wei,et al. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[13] Frank Vahid,et al. Warp Processing: Dynamic Translation of Binaries to FPGA Circuits , 2008, Computer.
[14] Yu Cao,et al. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.
[15] Hyojin Choi,et al. Memory access pattern-aware DRAM performance model for multi-core systems , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[16] Luca P. Carloni,et al. On learning-based methods for design-space exploration with High-Level Synthesis , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[17] Josep Llosa,et al. Swing module scheduling: a lifetime-sensitive approach , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.
[18] Zhiru Zhang,et al. SDC-based modulo scheduling for pipeline synthesis , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[19] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).
[20] Adrian Park,et al. Designing Modular Hardware Accelerators in C with ROCCC 2.0 , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.
[21] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.
[22] Pedro C. Diniz,et al. A compiler approach to fast hardware design space exploration in FPGA-based systems , 2002, PLDI '02.
[23] Amit Kumar Singh,et al. Exploiting loop-array dependencies to accelerate the design space exploration with high level synthesis , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[24] Yong Wang,et al. SDA: Software-defined accelerator for large-scale DNN systems , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).
[25] Jean Luc Philippe,et al. Design Space Pruning Through Early Estimations of Area/Delay Tradeoffs for FPGA Implementations , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.