FLOWER: A comprehensive dataflow compiler for high-level synthesis

FPGAs have found their way into data centers as accelerator cards, making reconfigurable computing more accessible for high-performance applications. At the same time, new high-level synthesis compilers like Xilinx Vitis and runtime libraries such as XRT attract software programmers into the reconfigurable domain. While software programmers are familiar with task-level and data-parallel programming, FPGAs often require different types of parallelism. For example, data-driven parallelism is mandatory to obtain satisfactory hardware designs for pipelined dataflow architectures. However, software programmers are often not acquainted with dataflow architectures— resulting in poor hardware designs. In this work we present FLOWER, a comprehensive compiler infrastructure that provides automatic canonical transformations for high-level synthesis from a domain-specific library. This allows programmers to focus on algorithm implementations rather than low-level optimizations for dataflow architectures. We show that FLOWER allows to synthesize efficient implementations for high-performance streaming applications targeting System-on-Chip and FPGA accelerator cards, in the context of image processing and computer vision.

[1]  T. Hoefler,et al.  Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis , 2019, FPGA.

[2]  John Wawrzynek,et al.  Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.

[3]  Cody Hao Yu,et al.  Best-Effort FPGA Programming: A Few Steps Can Go a Long Way , 2018, ArXiv.

[4]  Jason Cong,et al.  Software Infrastructure for Enabling FPGA-Based Accelerations in Data Centers: Invited Paper , 2016, ISLPED.

[5]  Fei Chen,et al.  When FPGA-Accelerator Meets Stream Data Processing in the Edge , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[6]  Philipp Slusallek,et al.  AnyDSL: a partial evaluation framework for programming high-performance libraries , 2018, Proc. ACM Program. Lang..

[7]  John Wawrzynek,et al.  High Level Synthesis with a Dataflow Architectural Template , 2016, ArXiv.

[8]  Pat Hanrahan,et al.  Rigel , 2016, ACM Trans. Graph..

[9]  Uday Bondhugula,et al.  A DSL compiler for accelerating image processing pipelines on FPGAs , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[10]  Jason Helge Anderson,et al.  LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems , 2013, TECS.

[11]  John Wawrzynek,et al.  Stream Computations Organized for Reconfigurable Execution (SCORE) , 2000, FPL.

[12]  Andrew M. Wallace,et al.  RIPL: A Parallel Image Processing Language for FPGAs , 2018, ACM Trans. Reconfigurable Technol. Syst..

[13]  Marco D. Santambrogio,et al.  A Unified Backend for Targeting FPGAs from DSLs , 2018, 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[14]  David Dolman,et al.  It's All About Data Movement: Optimising FPGA Data Access to Boost Performance , 2019, 2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC).

[15]  Torsten Hoefler,et al.  StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems , 2020, ArXiv.

[16]  Jürgen Teich,et al.  Generating FPGA-based image processing accelerators with Hipacc: (Invited paper) , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[17]  Marco D. Santambrogio,et al.  OXiGen: A Tool for Automatic Acceleration of C Functions Into Dataflow FPGA-Based Kernels , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[18]  Pat Hanrahan,et al.  Darkroom , 2014, ACM Trans. Graph..

[19]  Zhiru Zhang,et al.  Replication Package for Article: Predictable Accelerator Design with Time-Sensitive Affine types , 2020, Artifact Digital Object Group.

[20]  Jürgen Teich,et al.  HIPAcc: A Domain-Specific Language and Compiler for Image Processing , 2016, IEEE Transactions on Parallel and Distributed Systems.

[21]  Kunle Olukotun,et al.  Spatial: a language and compiler for application accelerators , 2018, PLDI.

[22]  Mark Silberstein,et al.  Design Patterns for Code Reuse in HLS Packet Processing Pipelines , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[23]  Satoshi Matsuoka,et al.  Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL , 2018, FPGA.

[24]  Torsten Hoefler,et al.  Transformations of High-Level Synthesis Codes for High-Performance Computing , 2018, IEEE Transactions on Parallel and Distributed Systems.

[25]  Ghislain Roquier,et al.  Dataflow/Actor-Oriented language for the design of complex signal processing systems , 2008 .

[26]  Daniel Gajski,et al.  An Introduction to High-Level Synthesis , 2009, IEEE Design & Test of Computers.

[27]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[28]  Philipp Slusallek,et al.  AnyHLS: High-Level Synthesis With Partial Evaluation , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[29]  Alexander V. Veidenbaum,et al.  AFFIX: Automatic Acceleration Framework for FPGA Implementation of OpenVX Vision Algorithms , 2019, FPGA.

[30]  Torsten Hoefler,et al.  Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures , 2019, SC.

[31]  Bruno Bodin,et al.  High-level synthesis of functional patterns with Lift , 2019, ARRAY@PLDI.

[32]  Yoshihiko Futamura,et al.  Partial Evaluation of Computation Process--An Approach to a Compiler-Compiler , 1999, High. Order Symb. Comput..

[33]  Uday Bondhugula,et al.  Bitwidth customization in image processing pipelines using interval analysis and SMT solvers , 2020, CC.

[34]  Jason Cong,et al.  HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration , 2020, FPGA.

[35]  Sebastian Hack,et al.  A graph-based higher-order intermediate representation , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).