A study of the scalability of on-chip routing for just-in-time FPGA compilation

Just-in-time (JIT) compilation has been used in many applications to enable standard software binaries to execute on different underlying processor architectures. We previously introduced the concept of a standard hardware binary, using a just-in-time compiler to compile the hardware binary to a field-programmable gate array (FPGA). Our JIT compiler includes lean versions of technology mapping, placement, and routing algorithms, of which routing is the most computationally and memory expensive step. As FPGAs continue to increase in size, a JIT FPGA compiler must be capable of efficiently mapping increasingly larger hardware circuits. In this paper, we analyze the scalability of our lean on-chip router, the Riverside on-chip router (ROCR), for routing increasingly large hardware circuits. We demonstrate that ROCR scales well in terms of execution time, memory usage and circuit quality, and we compare the scalability of ROCR to the well known versatile place and route (VPR) timing-driven routing algorithm, comparing to both their standard routing algorithm and their fast routing algorithm. Our results show that on average ROCR executes 3 times faster using 18 times less memory than VPR. ROCR requires only 1% more routing resources, while creating a critical path 30% longer VPR's standard timing-driven router. Furthermore, for the largest hardware circuit, ROCR executes 3 times faster using 14 times less memory, and results in a critical path 2.6% shorter than VPR's fast timing-driven router.

[1]  Daniel Brélaz,et al.  New methods to color the vertices of a graph , 1979, CACM.

[2]  Andreas Krall,et al.  Efficient JavaVM just-in-time compilation , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[3]  Frank Vahid,et al.  A study of the speedups and competitiveness of FPGA soft processor cores using dynamic hardware/software partitioning , 2005, Design, Automation and Test in Europe.

[4]  Vaughn Betz,et al.  Architecture and CAD for Deep-Submicron FPGAS , 1999, The Springer International Series in Engineering and Computer Science.

[5]  Frank Vahid,et al.  A configurable logic architecture for dynamic hardware/software partitioning , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[6]  Jason Cong,et al.  FlowMap: an optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[7]  Frank Vahid,et al.  Dynamic hardware/software partitioning: a first approach , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[8]  Jonathan Rose,et al.  The effect of logic block architecture on FPGA performance , 1992 .

[9]  Vaughn Betz,et al.  VPR: A new packing, placement and routing tool for FPGA research , 1997, FPL.

[10]  C. Y. Lee An Algorithm for Path Connections and Its Applications , 1961, IRE Trans. Electron. Comput..

[11]  Frank Vahid,et al.  Dynamic FPGA routing for just-in-time FPGA compilation , 2004, Proceedings. 41st Design Automation Conference, 2004..

[12]  Michael Gschwind,et al.  Dynamic and Transparent Binary Translation , 2000, Computer.

[13]  Richard J. Carter,et al.  Plasma: An FPGA for Million Gate Systems , 1996, Fourth International ACM Symposium on Field-Programmable Gate Arrays.

[14]  Carl Ebeling,et al.  Placement and routing tools for the Triptych FPGA , 1995, IEEE Trans. Very Large Scale Integr. Syst..