Probabilistic Scheduling in High-Level Synthesis

High-level synthesis (HLS) tools automatically transform a high-level program, for example in C/C++, into a low- level hardware description. A key challenge in HLS tools is scheduling, i.e. determining the start time of all the operations in the untimed program. There are three approaches to scheduling: static, dynamic and hybrid. A major shortcoming of existing approaches to scheduling is that the tools either assume the worst- case timing behaviour, which can cause significant performance loss or area overhead, or use simulation-based approaches, which take a long time to explore enough program traces.In this paper, we propose a probabilistic model that allows HLS tools to efficiently explore the timing behaviour of hardware generated from all these scheduling approaches. We capture the performance of the hardware using Petri nets, allowing us to leverage off-the-shelf Petri net analysis tools to make HLS decisions.We demonstrate the utility of our approach by using it to automatically infer the optimal initiation interval (II) for statically scheduled components that form part of a larger dynamically scheduled circuit. An empirical evaluation on a range of benchmarks suggests that by using this approach, on average we incur a 2% overhead in area-delay product (ADP) compared to optimal designs. In contrast, the static analysis in Vitis HLS incurs a 112% ADP overhead, while the throughput analysis in the dynamically scheduled Dynamatic tool incurs a 17% ADP overhead.

[1]  G. De Micheli,et al.  A module selection algorithm for high-level synthesis , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.

[2]  C. V. Ramamoorthy,et al.  Performance Evaluation of Asynchronous Concurrent Systems Using Petri Nets , 1980, IEEE Transactions on Software Engineering.

[3]  Marco Ajmone Marsan,et al.  Generalized Stochastic Petri Nets: A Definition at the Net Level and Its Implications , 1993, IEEE Trans. Software Eng..

[4]  Alexandre Yakovlev,et al.  Signal Graphs: From Self-Timed to Timed Ones , 1985, PNPM.

[5]  Keshab K. Parhi,et al.  ILP-based cost-optimal DSP synthesis with module selection and data format conversion , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[6]  Mario R. Casu,et al.  A new approach to latency insensitive design , 2004, Proceedings. 41st Design Automation Conference, 2004..

[7]  Vito Giovanni Castellana,et al.  High-level synthesis of memory bound and irregular parallel applications with Bambu , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).

[8]  van der Wmp Wil Aalst,et al.  Performance Analysis of Dataflow Architectures Using Timed Coloured Petri Nets , 2000 .

[9]  Ian Page,et al.  Compiling occam into Field-Programmable Gate Arrays , 2001 .

[10]  Peter A. Beerel,et al.  Performance Analysis of Asynchronous Circuits and Systems Using Stochastic Timed Petri Nets , 2000 .

[11]  Robert M. Shapiro Validation of a VLSI chip using hierarchical colored Petri nets , 1991 .

[12]  Josep Carmona,et al.  A structural encoding technique for the synthesis of asynchronous circuits , 2001, Proceedings Second International Conference on Application of Concurrency to System Design.

[13]  Luis Gomes,et al.  Improving Synchronous Dataflow Analysis Supported by Petri Net Mappings , 2018, Electronics.

[14]  Paolo Ienne,et al.  Combining Dynamic & Static Scheduling in High-level Synthesis , 2020, FPGA.

[15]  Zhiru Zhang,et al.  SDC-based modulo scheduling for pipeline synthesis , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[16]  Luca P. Carloni,et al.  Topology-Based Performance Analysis and Optimization of Latency-Insensitive Systems , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Paolo Ienne,et al.  Buffer Placement and Sizing for High-Performance Dataflow Circuits , 2020, FPGA.

[18]  Alberto L. Sangiovanni-Vincentelli,et al.  Theory of latency-insensitive design , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[19]  I. Ahmad,et al.  Integrated scheduling, allocation and module selection for design-space exploration in high-level synthesis , 1995 .

[20]  Marta Z. Kwiatkowska,et al.  PRISM 4.0: Verification of Probabilistic Real-Time Systems , 2011, CAV.

[21]  Paolo Ienne,et al.  An Out-of-Order Load-Store Queue for Spatial Computing , 2017, ACM Trans. Embed. Comput. Syst..

[22]  Stephen Neuendorffer,et al.  FPGA Pipeline Synthesis Design Exploration Using Module Selection and Resource Sharing , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[23]  Luca P. Carloni,et al.  From Latency-Insensitive Design to Communication-Based System-Level Design , 2015, Proceedings of the IEEE.

[24]  Kenneth L. McMillan,et al.  Using Unfoldings to Avoid the State Explosion Problem in the Verification of Asynchronous Circuits , 1992, CAV.

[25]  Jason Helge Anderson,et al.  Modulo SDC scheduling with recurrence minimization in high-level synthesis , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[26]  Montek Singh,et al.  Generalized latency-insensitive systems for single-clock and multi-clock architectures , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[27]  Alberto L. Sangiovanni-Vincentelli,et al.  Performance analysis and optimization of latency insensitive systems , 2000, Proceedings 37th Design Automation Conference.

[28]  Tadao Murata,et al.  Petri nets: Properties, analysis and applications , 1989, Proc. IEEE.

[29]  Jason Helge Anderson,et al.  LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems , 2013, TECS.

[30]  Paolo Ienne,et al.  Shrink It or Shed It! Minimize the Use of LSQs in Dataflow Designs , 2019, 2019 International Conference on Field-Programmable Technology (ICFPT).