Compatibility path based binding algorithm for interconnect reduction in high level synthesis

This paper describes a register and functional unit (FU) binding algorithm in high level synthesis. Our algorithm targets the reduction of multiplexer inputs. Since multiplexers connect multiple inputs to FUs or registers, the multiplexer count is a good indicator of the interconnect complexity. Reducing the number of multiplexer inputs results in reducing interconnect cost. Specifically, our algorithm constructs a weighted and ordered compatibility graph, and binds operations that form a long path in the graph together. As a result, operations with many flow dependencies and common inputs are bound to same FU, leading to a small number of FU inputs. In addition, the operation variables generated by a single FU are assigned to the same register so that connections between FUs and registers are reduced. We have implemented our algorithm within a MATLAB to Verilog conversion tool, and applied it to a suite of benchmark programs. Our experimental results have shown that the proposed scheme achieves 11.8%, 43.6% and 58.8% multiplexer input count reduction on average over weighted bipartite matching algorithm, k-cofamily algorithm and left edge algorithm, respectively. To assess the impact on interconnect reduction, we have generated layouts of the circuits from our Verilog description. It is shown that our approach delivers a 10.1% reduction in total wire-length of global interconnects with minor area overhead of register and FUs in comparison to the best previously proposed scheme.

[1]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[2]  Iyad Ouaiss,et al.  Register binding for FPGAs with embedded memory , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[3]  C. Papachristou,et al.  A linear program driven scheduling and allocation method followed by an interconnect optimization algorithm , 1990, 27th ACM/IEEE Design Automation Conference.

[4]  Miodrag Potkonjak,et al.  Optimum and heuristic transformation techniques for simultaneous optimization of latency and throughput , 1995, IEEE Trans. Very Large Scale Integr. Syst..

[5]  Kiyoung Choi,et al.  Behavior-to-placed RTL synthesis with performance-driven placement , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[6]  Jan M. Rabaey,et al.  A partitioning scheme for optimizing interconnect power , 1997, IEEE J. Solid State Circuits.

[7]  Marios C. Papaefthymiou,et al.  Design of a 20-mb/s 256-state Viterbi decoder , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[8]  Jason Cong,et al.  Register binding and port assignment for multiplexer optimization , 2004 .

[9]  Deming Chen,et al.  Low-power high-level synthesis for FPGA architectures , 2003, Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03..

[10]  Taewhan Kim,et al.  An integrated data path synthesis algorithm based on network flow method , 1995, Proceedings of the IEEE 1995 Custom Integrated Circuits Conference.

[11]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[12]  Minjoong Rim,et al.  Optimal allocation and binding in high-level synthesis , 1992, [1992] Proceedings 29th ACM/IEEE Design Automation Conference.

[13]  Taewhan Kim,et al.  Utilization of Multiport Memories in Data Path Synthesis , 1993, 30th ACM/IEEE Design Automation Conference.

[14]  Jae-Hoon Kim,et al.  Layout-driven resource sharing in high-level synthesis , 2002, IEEE/ACM International Conference on Computer Aided Design, 2002. ICCAD 2002..

[15]  Imtiaz Ahmad,et al.  Post-processor for data path synthesis using multiport memories , 1991, 1991 IEEE International Conference on Computer-Aided Design Digest of Technical Papers.

[16]  Yu-Chin Hsu,et al.  Data path allocation based on bipartite weighted matching , 1990, 27th ACM/IEEE Design Automation Conference.

[17]  Jan M. Rabaey,et al.  Low-power architectural synthesis and the impact of exploiting locality , 1996, J. VLSI Signal Process..

[18]  Jason Cong,et al.  Architecture and synthesis for on-chip multicycle communication , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19]  Jason Cong,et al.  Platform-based resource binding using a distributed register-file microarchitecture , 2006, ICCAD.

[20]  Alice C. Parker,et al.  3D scheduling: high-level synthesis with floorplanning , 1991, 28th ACM/IEEE Design Automation Conference.

[21]  Jason Cong,et al.  Architecture and synthesis for multi-cycle communication , 2003, ISPD '03.

[22]  Iyad Ouaiss,et al.  Optimizing register binding in FPGAs using simulated annealing , 2005, 2005 International Conference on Reconfigurable Computing and FPGAs (ReConFig'05).

[23]  Majid Sarrafzadeh,et al.  Layout Driven Data Communication Optimization for High Level Synthesis , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[24]  Ryan Kastner,et al.  Data communication estimation and reduction for reconfigurable systems , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[25]  Fadi J. Kurdahi,et al.  REAL: A Program for REgister ALlocation , 1987, 24th ACM/IEEE Design Automation Conference.

[26]  L. Stok,et al.  Interconnect optimisation during data path allocation , 1990, Proceedings of the European Design Automation Conference, 1990., EDAC..