Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future Prospects
暂无分享,去创建一个
[1] Fabrizio Ferrandi,et al. Using Efficient Path Profiling to Optimize Memory Consumption of On-Chip Debugging for High-Level Synthesis , 2017, ACM Trans. Embed. Comput. Syst..
[2] Florent de Dinechin,et al. Designing Custom Arithmetic Data Paths with FloPoCo , 2011, IEEE Design & Test of Computers.
[3] Tomofumi Yuki,et al. Toward Speculative Loop Pipelining for High-Level Synthesis , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[4] Viktor K. Prasanna,et al. HitGraph: High-throughput Graph Processing Framework on FPGA , 2019, IEEE Transactions on Parallel and Distributed Systems.
[5] Franz Franchetti,et al. Computer Generation of Hardware for Linear Digital Signal Processing Transforms , 2012, TODE.
[6] Alessandro Cilardo,et al. Improving Multibank Memory Access Parallelism with Lattice-Based Partitioning , 2015, ACM Trans. Archit. Code Optim..
[7] Jason Cong,et al. Customizable Computing—From Single Chip to Datacenters , 2019, Proceedings of the IEEE.
[8] Christian Lengauer,et al. Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation , 2012, Parallel Process. Lett..
[9] Xuan Yang,et al. Programming Heterogeneous Systems from an Image Processing DSL , 2016, ACM Trans. Archit. Code Optim..
[10] Amit K. Roy-Chowdhury,et al. Evaluation and Acceleration of High-Throughput Fixed-Point Object Detection on FPGAs , 2015, IEEE Transactions on Circuits and Systems for Video Technology.
[11] Nong Xiao,et al. Coarse-Grained Parallel Routing With Recursive Partitioning for FPGAs , 2021, IEEE Transactions on Parallel and Distributed Systems.
[12] Ray C. C. Cheung,et al. Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support , 2013, Microelectron. J..
[13] Yu Ting Chen,et al. A Survey and Evaluation of FPGA High-Level Synthesis Tools , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[14] Luka Daoud,et al. A Survey of High Level Synthesis Languages, Tools, and Compilers for Reconfigurable High Performance Computing , 2013, ICSS.
[15] Jürgen Teich,et al. HIPAcc: A Domain-Specific Language and Compiler for Image Processing , 2016, IEEE Transactions on Parallel and Distributed Systems.
[16] Jason Cong,et al. Source-to-Source Optimization for HLS , 2016, FPGAs for Software Programmers.
[17] George A. Constantinides,et al. Polyhedral-Based Dynamic Loop Pipelining for High-Level Synthesis , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[18] Patrice Quinton,et al. Polyhedral Bubble Insertion: A Method to Improve Nested Loop Pipelining for High-Level Synthesis , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[19] Brad L. Hutchings,et al. Enhancing debug observability for HLS-based FPGA circuits through source-to-source compilation , 2018, J. Parallel Distributed Comput..
[20] Vaughn Betz,et al. Networks-on-Chip for FPGAs: Hard, Soft or Mixed? , 2014, TRETS.
[21] J. M. Pierre Langlois,et al. Enhanced Precision Analysis for Accuracy-Aware Bit-Width Optimization Using Affine Arithmetic , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[22] Alan D. George,et al. ACM Transactions on Reconfigurable Technology and Systems Performance Analysis Framework for High-Level Language Applications in Reconfigurable Computing , 2009 .
[23] Steven J. E. Wilton,et al. Rapid Triggering Capability Using an Adaptive Overlay during FPGA Debug , 2018, ACM Trans. Design Autom. Electr. Syst..
[24] M. H. van Emden,et al. Interval arithmetic: From principles to implementation , 2001, JACM.
[25] Jason Cong,et al. High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[26] Zhenhua Duan,et al. ParRA: A Shared Memory Parallel FPGA Router Using Hybrid Partitioning Approach , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[27] Jonathan Rose,et al. Exploration and Customization of FPGA-Based Soft Processors , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[28] Mohsen Imani,et al. QuantHD: A Quantization Framework for Hyperdimensional Computing , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[29] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[30] J. Gregory Steffan,et al. The Potential for a GPU-Like Overlay Architecture for FPGAs , 2011, Int. J. Reconfigurable Comput..
[31] Jason Cong,et al. An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers , 2014, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[32] Stephen A. Edwards,et al. Compositional Dataflow Circuits , 2019, ACM Trans. Embed. Comput. Syst..
[33] Kevin E. Murray,et al. VTR 8: High Performance CAD and Customizable FPGA Architecture Modelling , 2020 .
[34] Katrina Falkner,et al. Towards Automatic High-Level Code Deployment on Reconfigurable Platforms: A Survey of High-Level Synthesis Tools and Toolchains , 2020, IEEE Access.
[35] Steven Trimberger,et al. Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology , 2015, Proceedings of the IEEE.
[36] Yun Liang,et al. FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[37] Mickaël Raulet,et al. OpenDF: a dataflow toolset for reconfigurable hardware and multicore systems , 2008, CARN.
[38] Jiaqi Gu,et al. DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[39] Margaret Martonosi,et al. Decoupling Data Supply from Computation for Latency-Tolerant Communication in Heterogeneous Architectures , 2017, ACM Trans. Archit. Code Optim..
[40] Scott A. Mahlke,et al. PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators , 2002, J. VLSI Signal Process..
[41] Jason Cong,et al. CPU-FPGA Coscheduling for Big Data Applications , 2018, IEEE Design & Test.
[42] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[43] Paolo Ienne,et al. An Out-of-Order Load-Store Queue for Spatial Computing , 2017, ACM Trans. Embed. Comput. Syst..
[44] Steven J. E. Wilton,et al. Signal-Tracing Techniques for In-System FPGA Debugging of High-Level Synthesis Circuits , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[45] Jason Cong,et al. Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[46] Yu Wang,et al. DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[47] Pat Hanrahan,et al. Darkroom , 2014, ACM Trans. Graph..
[48] Giovanni Ansaloni,et al. Leveraging Prior Knowledge for Effective Design-Space Exploration in High-Level Synthesis , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[49] Olivier Sentieys,et al. A frame-based domain-specific language for rapid prototyping of FPGA-based software-defined radios , 2014, EURASIP J. Adv. Signal Process..
[50] Vaughn Betz,et al. Efficient and Deterministic Parallel Placement for FPGAs , 2011, TODE.
[51] Wayne Luk,et al. Ieee Transactions on Computer-aided Design of Integrated Circuits and Systems Accuracy Guaranteed Bit-width Optimization Abstract— We Present Minibit, an Automated Static Approach for Optimizing Bit-widths of Fixed-point Feedforward Designs with Guaranteed Accuracy. Methods to Minimize Both the In- , 2022 .
[52] Miriam Leeser,et al. VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware , 2010, TRETS.
[53] Muhammad Faisal Siddiqui,et al. FPGA Based Real-Time Implementation of Online EMD With Fixed Point Architecture , 2019, IEEE Access.