Efficient Memory Arbitration in High-Level Synthesis From Multi-Threaded Code

High-level synthesis (HLS) is an increasingly popular method for generating hardware from a description written in a software language like C/C++. Traditionally, HLS tools have operated on sequential code, however in recent years there has been a drive to synthesise multi-threaded code. In this context, a major challenge facing HLS tools is how to automatically partition memory among parallel threads to fully exploit the bandwidth available on an FPGA device and minimise memory contention. Existing partitioning approaches require inefficient arbitration circuitry to serialise accesses to each bank because they make conservative assumptions about which threads might access which memory banks. In this article, we design a static analysis that can prove certain memory banks are only accessed by certain threads, and use this analysis to simplify or even remove the arbiters while preserving correctness. We show how this analysis can be implemented using the Microsoft Boogie verifier on top of satisfiability modulo theories (SMT) solver, and propose a tool named EASY using automatic formal verification. Our work supports arbitrary input code with any irregular memory access patterns and indirect array addressing forms. We implement our approach in LLVM and integrate it into the LegUp HLS tool. For a set of typical application benchmarks our results have shown that EASY can achieve 0.13× (avg. 0.43×) of area and 1.64× (avg. 1.28×) of performance compared to the baseline, with little additional compilation time relative to the long time in

[1]  Yu Ting Chen,et al.  Automated generation of banked memory architectures in the high-level synthesis of multi-threaded software , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[2]  Vito Giovanni Castellana,et al.  High-level synthesis of memory bound and irregular parallel applications with Bambu , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).

[3]  Alessandro Cilardo,et al.  Improving Multibank Memory Access Parallelism with Lattice-Based Partitioning , 2015, ACM Trans. Archit. Code Optim..

[4]  K. Rustan M. Leino,et al.  Trigger Selection Strategies to Stabilize Program Verifiers , 2016, CAV.

[5]  Nathan Chong,et al.  Scalable verification techniques for data-parallel programs , 2014 .

[6]  Jason Helge Anderson,et al.  LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems , 2013, TECS.

[7]  Yu Ting Chen,et al.  EASY: Efficient Arbiter SYnthesis from Multi-threaded Code , 2019, FPGA.

[8]  Stephen D. Brown,et al.  From Pthreads to Multicore Hardware Systems in LegUp High-Level Synthesis for FPGAs , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[10]  Bor-Yuh Evan Chang,et al.  Boogie: A Modular Reusable Verifier for Object-Oriented Programs , 2005, FMCO.

[11]  David B. Thomas,et al.  Using Runahead Execution to Hide Memory Latency in High Level Synthesis , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[12]  Jason Cong,et al.  Theory and algorithm for generalized memory partitioning in high-level synthesis , 2014, FPGA.

[13]  George A. Constantinides,et al.  MATCHUP: Memory Abstractions for Heap Manipulating Programs , 2015, FPGA.

[14]  Yuan Zhou,et al.  Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs , 2018, FPGA.

[15]  Asif Islam,et al.  LegUp-NoC: High-Level Synthesis of Loops with Indirect Addressing , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[16]  Daniel Gajski,et al.  An Introduction to High-Level Synthesis , 2009, IEEE Design & Test of Computers.

[17]  Zhiru Zhang,et al.  A New Approach to Automatic Memory Banking using Trace-Based Address Mining , 2017, FPGA.

[18]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[19]  Mingjie Lin,et al.  Graph-Theoretically Optimal Memory Banking for Stencil-Based Computing Kernels , 2018, FPGA.

[20]  Vincent W. Freeh,et al.  A Comparison of Implicit and Explicit Parallel Programming , 1996, J. Parallel Distributed Comput..

[21]  Gilles Villard,et al.  Lattice-based memory allocation , 2003, IEEE Transactions on Computers.

[22]  Qiang Liu,et al.  Automatic On-chip Memory Minimization for Data Reuse , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[23]  Jonathan Whitaker,et al.  SMACK Software Verification Toolchain , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).