Extending High-Level Synthesis with High-Performance Computing Performance Visualization
暂无分享,去创建一个
[1] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[2] John Freeman,et al. From opencl to high-performance hardware on FPGAS , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).
[3] Pat Hanrahan,et al. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.
[4] Mats Brorsson,et al. Empowering OpenMP with automatically generated hardware , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).
[5] Jan Langer,et al. OmpSs@Zynq all-programmable SoC ecosystem , 2014, FPGA.
[6] Dirk Schmidl,et al. Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir , 2011, Parallel Tools Workshop.
[7] Jason Helge Anderson,et al. LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.
[8] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[9] Satoshi Matsuoka,et al. Designing and accelerating spiking neural networks using OpenCL for FPGAs , 2017, 2017 International Conference on Field Programmable Technology (ICFPT).
[10] Jiayi Sheng,et al. Fully Integrated On-FPGA Molecular Dynamics Simulations , 2019, ArXiv.
[11] George Ho,et al. PAPI: A Portable Interface to Hardware Performance Counters , 1999 .
[12] Andreas Koch,et al. Optimized high-level synthesis of SMT multi-threaded hardware accelerators , 2015, 2015 International Conference on Field Programmable Technology (FPT).
[13] Andreas Koch,et al. Automatic high-level synthesis of multi-threaded hardware accelerators , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).
[14] Eriko Nurvitadhi,et al. Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 International Conference on Field-Programmable Technology (FPT).
[15] Toni Cortes,et al. PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .
[16] Charles E. Leiserson,et al. Optimizing Synchronous Circuitry by Retiming (Preliminary Version) , 1983 .
[17] Nikolaos Bellas,et al. SoCLog: A real-time, automatically generated logging and profiling mechanism for FPGA-based Systems On Chip , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[18] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[19] Yong Dou,et al. 64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.
[20] Alan D. George,et al. Communication visualization for bottleneck detection of high-level synthesis applications , 2012, FPGA '12.
[21] Mats Brorsson,et al. Grain graphs: OpenMP performance analysis made easy , 2016, PPoPP.
[22] Matthias S. Müller,et al. The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.
[23] Andreas Koch,et al. Hardware/software co-compilation with the Nymble system , 2013, 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC).
[24] Jason Helge Anderson,et al. Source-level debugging for FPGA high-level synthesis , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).
[25] Satoshi Matsuoka,et al. Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL , 2018, FPGA.
[26] Alan D. George,et al. ACM Transactions on Reconfigurable Technology and Systems Performance Analysis Framework for High-Level Language Applications in Reconfigurable Computing , 2009 .
[27] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[28] Bernd Hamann,et al. State of the Art of Performance Visualization , 2014, EuroVis.