A configurable logic architecture for dynamic hardware/software partitioning

In previous work, we showed the benefits and feasibility of having a processor dynamically partition its executing software such that critical software kernels are transparently partitioned to execute as a hardware coprocessor on configurable logic - an approach we call warp processing. The configurable logic place and route step is the most computationally intensive part of such hardware/software partitioning, normally running for many minutes or hours on powerful desktop processors. In contrast, dynamic partitioning requires place and route to execute in just seconds and on a lean embedded processor. We have therefore designed a configurable logic architecture specifically for dynamic hardware/software partitioning. Through experiments with popular benchmarks, we show that by specifically focusing on the goal of software kernel speedup when designing the FPGA architecture, rather than on the more general goal of ASIC prototyping, we can perform place and route for our architecture 50 times faster, using 10,000 times less data memory, and 1,000 times less code memory, than popular commercial tools mapping to commercial configurable logic. Yet, we show that we obtain speedups (2x on average, and as much as 4x) and energy savings (33% on average, and up to 74%) when partitioning even just one loop, which are comparable to commercial tools and fabrics. Thus, our configurable logic architecture represents a good candidate for platforms that will support dynamic hardware/software partitioning, and enables ultra-fast desktop tools for hardware/software partitioning, and even for fast configurable logic design in general.

[1]  Jörg Henkel A low power hardware/software partitioning approach for core-based embedded systems , 1999, DAC '99.

[2]  Michael Gschwind,et al.  Dynamic and Transparent Binary Translation , 2000, Computer.

[3]  Fadi J. Kurdahi,et al.  Design and Implementation of the MorphoSys Reconfigurable Computing Processor , 2000, J. VLSI Signal Process..

[4]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[5]  Frank Vahid,et al.  Hardware/software partitioning of software binaries , 2002, ICCAD 2002.

[6]  Jonathan Rose,et al.  The effect of logic block architecture on FPGA performance , 1992 .

[7]  P. Chow,et al.  The design of an SRAM-based field-programmable gate array. I. Architecture , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[8]  Frank Vahid,et al.  Energy Advantages of Microprocessor Platforms with On-Chip Configurable Logic , 2002, IEEE Des. Test Comput..

[9]  Frank Vahid,et al.  A codesigned on-chip logic minimizer , 2003, First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721).

[10]  Vaughn Betz,et al.  Speed and area tradeoffs in cluster-based FPGA architectures , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[11]  Jörg Henkel,et al.  Hardware-software cosynthesis for microcontrollers , 1993, IEEE Design & Test of Computers.

[12]  Paul J. M. Havinga,et al.  Dynamic Reconfiguration in Mobile Systems , 2002, FPL.

[13]  Frank Vahid,et al.  SpecSyn: an environment supporting the specify-explore-refine paradigm for hardware/software system design , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[14]  Donatella Sciuto,et al.  Partitioning and exploration strategies in the TOSCA co-design flow , 1996, Proceedings of 4th International Workshop on Hardware/Software Co-Design. Codes/CASHE '96.

[15]  Jörg Henkel,et al.  Energy-conscious HW/SW-partitioning of embedded systems: a case study on an MPEG-2 encoder , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[16]  Brad L. Hutchings,et al.  A dynamic instruction set computer , 1995, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[17]  Jörg Henkel,et al.  A hardware/software partitioner using a dynamically determined granularity , 1997, DAC.

[18]  Michael J. Wirthlin,et al.  DISC: the dynamic instruction set computer , 1995, Optics East.

[19]  Fadi J. Kurdahi,et al.  A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture , 2001, CASES '01.

[20]  Jan M. Rabaey,et al.  An energy conscious methodology for early design exploration of heterogeneous DSPs , 1998, Proceedings of the IEEE 1998 Custom Integrated Circuits Conference (Cat. No.98CH36143).

[21]  Petru Eles,et al.  System Level Hardware/Software Partitioning Based on Simulated Annealing and Tabu Search , 1997, Des. Autom. Embed. Syst..

[22]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[23]  Doug Simon,et al.  Preliminary experience with the use of the UQBT binary translation framework , 1999, PACT 1999.

[24]  Scott Hauck,et al.  The Chimaera reconfigurable functional unit , 1997, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[25]  Tughrul Arslan,et al.  Proceedings Design, Automation and Test in Europe Conference and Exhibition , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[26]  Frank Vahid,et al.  Dynamic hardware/software partitioning: a first approach , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[27]  Frank Vahid,et al.  On-chip logic minimization , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[28]  Jonathan Rose,et al.  The Design of an SRAM-Based Field-Programmable Gate Array — Part I : Architecture , 1999 .