Loop Parallelization Techniques for FPGA Accelerator Synthesis
暂无分享,去创建一个
[1] Jason Helge Anderson,et al. LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.
[2] Jason Cong,et al. AutoPilot: A Platform-Based ESL Synthesis System , 2008 .
[3] Jürgen Teich,et al. Loop coarsening in C-based High-Level Synthesis , 2015, 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[4] Fabrizio Ferrandi,et al. Exploiting Outer Loops Vectorization in High Level Synthesis , 2015, ARCS.
[5] Monica S. Lam,et al. RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .
[6] Dejan Markovic,et al. A Multi-Granularity FPGA With Hierarchical Interconnects for Efficient and Flexible Mobile Computing , 2015, IEEE Journal of Solid-State Circuits.
[7] Martin Odersky,et al. Making domain-specific hardware synthesis tools cost-efficient , 2013, 2013 International Conference on Field-Programmable Technology (FPT).
[8] Jürgen Teich,et al. Code generation from a domain-specific language for C-based HLS of hardware accelerators , 2014, 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[9] Donald G. Bailey,et al. Design for Embedded Image Processing on FPGAs , 2011 .
[10] Pat Hanrahan,et al. Darkroom , 2014, ACM Trans. Graph..
[11] Anil K. Jain,et al. Computer Vision Algorithms on Reconfigurable Logic Arrays , 1999, IEEE Trans. Parallel Distributed Syst..
[12] Implementing FPGA Design with the OpenCL Standard , 2010 .
[13] Jürgen Teich,et al. FPGA-based accelerator design from a domain-specific language , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[14] Alain Darte,et al. Optimizing remote accesses for offloaded kernels: Application to high-level synthesis for FPGA , 2012, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[15] Muhsen Owaida,et al. Synthesis of Platform Architectures from OpenCL Programs , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.
[16] Vinod Kathail,et al. An Integrated Framework for Application Engine Synthesis and Verification from High Level C Algorithms , 2008 .
[17] Dejan Markovic,et al. 27.5 A multi-granularity FPGA with hierarchical interconnects for efficient and flexible mobile computing , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[18] Stephen Dean Brown,et al. Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs , 2013, TRETS.
[19] Jason Helge Anderson,et al. From software threads to parallel hardware in high-level synthesis for FPGAs , 2013, 2013 International Conference on Field-Programmable Technology (FPT).
[20] Jason Cong,et al. Polyhedral-based data reuse optimization for configurable computing , 2013, FPGA '13.
[21] Vinod Kathail,et al. Algorithmic Synthesis Using PICO , 2008 .
[22] Roberto Manduchi,et al. Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).
[23] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[24] Michael Meredith. High-Level SystemC Synthesis with Forte's Cynthesizer , 2008 .
[25] Donald G. Bailey,et al. Design for Embedded Image Processing on FPGAs: Bailey/Design for Embedded Image Processing on FPGAs , 2011 .
[26] Paul Feautrier,et al. Polyhedron Model , 2011, Encyclopedia of Parallel Computing.
[27] Jürgen Teich,et al. PARO: Synthesis of Hardware Accelerators for Multi-Dimensional Dataflow-Intensive Applications , 2008, ARC.
[28] Jason Cong,et al. Throughput Optimization for High-Level Synthesis Using Resource Constraints , 2014 .
[29] Kazutoshi Wakabayashi,et al. C-based SoC design flow and EDA tools: an ASIC and system vendorperspective , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[30] Marc Reichenbach,et al. A Generic VHDL Template for 2D Stencil Code Applications on FPGAs , 2012, 2012 IEEE 15th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops.
[31] Geppino Pucci,et al. Universality in VLSI Computation , 2011, ParCo 2011.
[32] Jürgen Teich,et al. HIPAcc: A Domain-Specific Language and Compiler for Image Processing , 2016, IEEE Transactions on Parallel and Distributed Systems.
[33] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[34] Jason Cong,et al. FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.
[35] Sang-Yong Han,et al. Exploiting Spatial and Temporal Parallelism in the Multithreaded Node Architecture Implemented on Superscalar RISC Processors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.
[36] David Padua,et al. Encyclopedia of Parallel Computing , 2011 .
[37] Jürgen Teich,et al. Code generation for embedded heterogeneous architectures on android , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[38] Albert Cohen,et al. Polyhedral-Model Guided Loop-Nest Auto-Vectorization , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[39] Adrian Park,et al. Designing Modular Hardware Accelerators in C with ROCCC 2.0 , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.
[40] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.