Integrating variable-latency components into high-level synthesis

Components used as building blocks (e.g., functional units) in conventional HLS techniques are assumed to have fixed latency values. Variable-latency units exhibit the property that the number of cycles taken to compute their outputs varies depending on the input values. While variable-latency units offer potential for performance improvement, we demonstrate that realization of this potential requires that HLS be adapted suitably (sub-optimal use of variable-latency units can lead to performance degradation, or unnecessarily high area overheads). Our techniques to incorporate variable-latency units into HLS ensure that the performance improvement is maximized, while minimizing area overheads or satisfying resource constraints. These techniques are not restricted to specific HLS tools/algorithms, and can be plugged in to any generic HLS system. Since area overheads may still be incurred due to the use of variable-latency units, we present a novel technique, based on the concept of reduced variable-latency units, to further reduce area overheads. Reduced variable-latency units only implement the low-latency case behavior of complete variable-latency units. We demonstrate that the use of reduced variable-latency units significantly reduces area overheads, and sometimes results in improvements in performance while simultaneously reducing the area of the register transfer level implementation. Experimental results show that the proposed variable-latency-unit-based synthesis techniques achieve a performance improvement of up to 1.6/spl times/ (average of 1.4/spl times/) over a state-of-the-art HLS tool, with minimal area overheads (average of 5.3%). The use of reduced variable-latency units leads to a performance improvement of up to 1.6/spl times/ (average of 1.3/spl times/), with a simultaneous area reduction of up to 17.9% (10.6% on the average).

[1]  Daniel D. Gajski,et al.  High ― Level Synthesis: Introduction to Chip and System Design , 1992 .

[2]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[3]  Alexandru Nicolau,et al.  Percolation based synthesis , 1991, DAC '90.

[4]  Sujit Dey,et al.  Performance Analysis and Optimization of Schedules for Conditional and Loop-Intensive Specifications , 1994, 31st Design Automation Conference.

[5]  Luca Benini,et al.  Telescopic units: a new paradigm for performance optimization of VLSI designs , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[6]  Marios C. Papaefthymiou,et al.  Precomputation-based sequential logic optimization for low power , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[7]  Marios C. Papaefthymiou,et al.  Precomputation-based sequential logic optimization for low power , 1994, ICCAD '94.

[8]  Robert K. Brayton,et al.  Sequential circuit design using synthesis and optimization , 1992, Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors.

[9]  Sujit Dey,et al.  Common-case computation: a high-level technique for power and performance optimization , 1999, DAC '99.

[10]  Wayne Wolf,et al.  High-Level VLSI Synthesis , 1991 .

[11]  Soha Hassoun,et al.  Architectural retiming: pipelining latency-constrained circuits , 1996, DAC '96.

[12]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[13]  Minh N. Do,et al.  Youn-Long Steve Lin , 1992 .

[14]  Niraj K. Jha,et al.  Wavesched: a novel scheduling technique for control-flow intensive designs , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[15]  Niraj K. Jha,et al.  Wavesched: a novel scheduling technique for control-flow intensive behavioral descriptions , 1997, 1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).