Interconnect-Aware Pipeline Synthesis for Array-Based Architectures

In the deep-submicron era, interconnect delays are becoming one of the most important factors that can affect performance in the VLSI design. Many state-of-the-art research in high level synthesis try to consider the effect of interconnect delays. These research indeed achieve better performance compared with traditional ones which ignore interconnect delays. When applications contain large loops, however, there is still much room to improve the performance by exploiting the parallelism. In this paper, we, for the first time, propose a method to utilize pipelining techniques and take interconnect delays into account together so as to improve the quality of high level synthesis. The proposed method has the following two characteristics: 1) it separates the consideration of interconnect delay from computation delay, and allows concurrent data transfer and computation; 2) it belongs to modulo scheduling framework, in the sense that all iterations have identical schedules, and are initiated periodically. We evaluate our method from two different points of view. Firstly, we compare our method with an existing interconnect-aware high level synthesis that does not utilize pipelining techniques, and the experimental results show that our method can obtain about 3.4 times performance improvement on average. Secondly, we compare our method with an existing pipeline synthesis that does not consider interconnect delays, and the results show that our method can obtain about 1.5 times performance improvement on average. In addition, we also evaluate our proposed architecture and the experimental results demonstrate that it is better than existing architecture in [1].

[1]  Josep Llosa,et al.  Lifetime-Sensitive Modulo Scheduling in a Production Environment , 2001, IEEE Trans. Computers.

[2]  Kiyoung Choi,et al.  High-level synthesis under multi-cycle interconnect delay , 2001, ASP-DAC '01.

[3]  Mircea R. Stan,et al.  5-GHz 32-bit Integer Execution Core in 130-nm Dual-VT CMOS , 2001 .

[4]  Jason Cong,et al.  Architecture-level synthesis for automatic interconnect pipelining , 2004, Proceedings. 41st Design Automation Conference, 2004..

[5]  R. Engelbrecht,et al.  DIGEST of TECHNICAL PAPERS , 1959 .

[6]  Keikichi Tamaru,et al.  A Floorplan Based Methodology for Data-Path Synthesis of Sub-micron ASICs , 1996 .

[7]  Kiyoung Choi,et al.  Behavior-to-placed RTL synthesis with performance-driven placement , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[8]  Masahiro Fujita,et al.  Pipeline scheduling for array based reconfigurable architectures considering interconnect delays , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[9]  Masahiro Fujita,et al.  Interconnect-aware Pipeline Synthesis for Array based Reconfigurable Architectures , 2007, IESS.

[10]  Majid Sarrafzadeh,et al.  Layout Driven Data Communication Optimization for High Level Synthesis , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[11]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[12]  Sabih H. Gerez,et al.  A Genetic Approach to the Overlapped Scheduling of Iterative Data-Flow Graphs for Target Architectures with Communication Delays , 1997 .

[13]  Sabih H. Gerez,et al.  An Integer Linear Programming Approach to the Overlapped Scheduling of Iterative Data-Flow Graphs for Target Architectures with Communication Delays , 2000 .

[14]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[15]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[16]  Kaushik Roy,et al.  Layout-driven architecture synthesis for high-speed digital filters , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[17]  Vaughn Betz,et al.  Timing-driven placement for FPGAs , 2000, FPGA '00.

[18]  Fadi J. Kurdahi,et al.  Layout-driven high level synthesis for FPGA based architectures , 1998, Proceedings Design, Automation and Test in Europe.

[19]  Jason Cong,et al.  Architecture and synthesis for on-chip multicycle communication , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[20]  Vicki H. Allan,et al.  Software pipelining , 1995, CSUR.

[21]  J. Tschanz,et al.  A 25 GHz 32 b integer-execution core in 130 nm dual-V/sub T/ CMOS , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[22]  S. Hsu,et al.  A 110 GOPS/W 16-bit multiplier and reconfigurable PLA loop in 90-nm CMOS , 2005, IEEE Journal of Solid-State Circuits.

[23]  Edwin Hsing-Mean Sha,et al.  Architecture-Dependent Loop Scheduling via Communication-Sensitive Remapping , 1995, ICPP.