High-Level Synthesis With Behavioral-Level Multicycle Path Analysis

High-level synthesis (HLS) tools generate register-transfer level (RTL) hardware descriptions from behavioral-level specifications through resource allocation, scheduling and binding. Traditionally, HLS tools build datapath pipelines by inserting pipeline registers to break combinational logic into single-cycle segments; accurately analyzing that the number of available cycles for signal propagation is proven to be infeasible at the RT-level. Thus, RT-level timing analyses must pessimistically assume each path has at most one cycle for signal propagation. This leads to false positives in critical-path analyses, prevents RTL synthesis tools from optimizing real critical paths, and forces HLS flows to insert pipeline registers without improving hardware quality. In this paper, we present an efficient behavioral-level multicycle path analysis (BL-MCPA) algorithm that leverages control-data flow information to reduce time complexity of multicycle path analysis from exponential to polynomial. BL-MCPA helps eliminate false positives in timing analysis, and improves the reported fmax by 15% on average. With BL-MCPA, we avoid unnecessary pipeline register insertion, and reduce execution latency by 25% and register usage by 29% under a user fmax constraint of 300 MHz. Using BL-MCPA, we replace large multiplexers (MUXs) by pipelined MUX-trees and reduce execution latency of hardware by up to 67% on designs whose performance is limited by the large MUXs.

[1]  Dihu Chen,et al.  A gradual scheduling framework for problem size reduction and cross basic block parallelism exploitation in high-level synthesis , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).

[2]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[3]  Jason Cong,et al.  High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Jason Cong,et al.  An efficient and versatile scheduling algorithm based on SDC formulation , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[5]  Robert K. Brayton,et al.  Performance Optimization Using Exact Sensitization , 1994, 31st Design Automation Conference.

[6]  Kazuyoshi Takagi,et al.  Waiting false path analysis of sequential logic circuits for performance optimization , 1998, ICCAD.

[7]  Jason Cong,et al.  Architecture and synthesis for on-chip multicycle communication , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Martin D. F. Wong,et al.  Timing constraint-driven technology mapping for FPGAs considering false paths and multi-clock domains , 2007, 2007 IEEE/ACM International Conference on Computer-Aided Design.

[9]  Jason Cong,et al.  FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.

[10]  John Freeman,et al.  From opencl to high-performance hardware on FPGAS , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[11]  Eric Senn,et al.  ∂ GAUT: A High-Level Synthesis Tool for DSP applications , 2008 .

[12]  Hiroyuki Higuchi,et al.  Enhancing the performance of multi-cycle path analysis in an industrial setting , 2004 .

[13]  Deming Chen,et al.  Fast and effective placement and routing directed high-level synthesis for FPGAs , 2014, FPGA.

[14]  Daniel Kroening,et al.  Fixed points for multi-cycle path detection , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[15]  Kees A. Vissers,et al.  Optimized generation of data-path from C codes for FPGAs , 2005, Design, Automation and Test in Europe.

[16]  Kiyoung Choi,et al.  High-level synthesis under multi-cycle interconnect delay , 2001, ASP-DAC '01.

[17]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[18]  Kiyoung Choi,et al.  Performance-driven high-level synthesis with bit-level chaining andclock selection , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[19]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.

[20]  Daniel D. Gajski,et al.  High ― Level Synthesis: Introduction to Chip and System Design , 1992 .

[21]  Arvind,et al.  Synthesis from multi-cycle atomic actions as a solution to the timing closure problem , 2008, 2008 IEEE/ACM International Conference on Computer-Aided Design.

[22]  R. A. Towle,et al.  Control and data dependence for program transformations. , 1976 .

[23]  Jason Cong,et al.  Coordinated resource optimization in behavioral synthesis , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[24]  Jason Cong,et al.  xPilot: A Platform-Based Behavioral Synthesis System , 2005 .

[25]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[26]  Nikil D. Dutt,et al.  SPARK: a high-level synthesis framework for applying parallelizing compiler transformations , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[27]  Jason Cong,et al.  Scheduling with soft constraints , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[28]  M. Schlansker,et al.  On Predicated Execution , 1991 .

[29]  Deming Chen,et al.  High-level synthesis with behavioral level multi-cycle path analysis , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[30]  Hiroyuki Tomiyama,et al.  CHStone: A benchmark program suite for practical C-based high-level synthesis , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[31]  Sharad Malik,et al.  Exploiting multicycle false paths in the performance optimization of sequential logic circuits , 1995, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..