Exploiting Bit-Level Delay Calculations to Soften Read-After-Write Dependences in Behavioral Synthesis

Conventional high-level synthesis (HLS) algorithms are very conservative when dealing with read-after-write (RAW) dependences, the execution of one operation is allowed once all its predecessors have been calculated. However, in the execution of arithmetic operations, some bits are required later than others, and some bits are produced earlier than others. This paper proposes a presynthesis optimization algorithm that relaxes RAW dependences, taking advantage of this feature for a more efficient HLS of data flow graphs formed by additions, multiplications, and logic operations. The presented preprocessor analyzes the critical path at bit granularity and splits the arithmetic operations into subword fragments. These fragments become the input to any regular HLS tool to speed up circuit execution times through scheduling in different cycles of the fragments obtained from the same original operation. This way, the execution of one operation may begin before the calculus of its predecessors has been completed. This becomes feasible when the execution of the predecessor has begun in the selected cycle or in a previous one, and even if it will finish in a posterior cycle. The experimental results that were carried out show that implementations obtained from the optimized specification are, on the average, 70% faster, with only slight variations in the data path area.

[1]  Alice C. Parker,et al.  Sehwa: a software package for synthesis of pipelines from behavioral specifications , 1988, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[2]  Zhan Yu,et al.  The use of carry-save representation in joint module selection and retiming , 2000, DAC.

[3]  Kiyoung Choi,et al.  High-level synthesis under multi-cycle interconnect delay , 2001, ASP-DAC '01.

[4]  Kiyoung Choi,et al.  Performance-driven high-level synthesis with bit-level chaining andclock selection , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[5]  Peter Marwedel,et al.  Built-in chaining: introducing complex components into architectural synthesis , 1997, Proceedings of ASP-DAC '97: Asia and South Pacific Design Automation Conference.

[6]  Kiyoung Choi,et al.  Power minimization of functional units partially guarded computation , 2000, ISLPED '00.

[7]  Apostolos A. Kountouris,et al.  Efficient scheduling of conditional behaviors for high-level synthesis , 2002, TODE.

[8]  Wayne Luk,et al.  Multiple-Wordlength Resource Binding , 2000, FPL.

[9]  Keshab K. Parhi,et al.  High-performance digit-serial complex-number multiplier-accumulator , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[10]  Román Hermida,et al.  Pre-synthesis Optimization of Multiplications to Improve Circuit Performance , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[11]  Nikil D. Dutt,et al.  Coordinated transformations for high-level synthesis of high performance microprocessor blocks , 2002, DAC '02.

[12]  M. Potkonjak,et al.  Low-power behavioral synthesis optimization using multiple precision arithmetic , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).

[13]  Albert E. Casavant,et al.  Scheduling and hardware sharing in pipelined data paths , 1989, 1989 IEEE International Conference on Computer-Aided Design. Digest of Technical Papers.

[14]  Majid Sarrafzadeh,et al.  A scheduling algorithm for optimization and early planning in high-level synthesis , 2005, TODE.

[15]  Jianwen Zhu,et al.  Soft scheduling in high level synthesis , 1999, DAC '99.

[16]  Román Hermida,et al.  Bitwise scheduling to balance the computational cost of behavioral specifications , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Gerald E. Sobelman,et al.  FPGA-based FIR filters using digit-serial arithmetic , 1997, Proceedings. Tenth Annual IEEE International ASIC Conference and Exhibit (Cat. No.97TH8334).

[18]  Srivaths Ravi,et al.  Integrating variable-latency components into high-level synthesis , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[19]  Roman Hermida,et al.  Behavioural specifications allocation to minimise bit level waste of functional units , 2003 .