Two-Level Microprocessor-Accelerator Partitioning

The integration of microprocessors and field-programmable gate array (FPGA) fabric on a single chip increases both the utility and necessity of tools that automatically move software functions from the microprocessor to accelerators on the FPGA to improve performance or energy. Such hardware/software partitioning for modern FPGAs involves the problem of partitioning functions among two levels of accelerator groups - tightly-coupled accelerators that have fast single-clock-cycle memory access to the microprocessor's memory, and loosely-coupled accelerators that access memory through a bridge to avoid slowing the main clock period with their longer critical paths. This new two-level accelerator-partitioning problem was introduced, and a novel optimal dynamic programming algorithm was described to solve the problem. By making use of the size constraint imposed by FPGAs, the algorithm has what is effectively quadratic runtime complexity, running in just a few seconds for examples with up to 25 accelerators, obtaining an average performance improvement of 35% compared to a traditional single-level bus architecture

[1]  Luca Benini,et al.  Design Automation of Embedded Systems , 2003 .

[2]  Frank Vahid,et al.  Energy savings and speedups from partitioning critical software loops to hardware in embedded systems , 2004, TECS.

[3]  Scott Hauck,et al.  Reconfigurable computing: a survey of systems and software , 2002, CSUR.

[4]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[5]  Herman Schmit,et al.  PCI-PipeRench and the SWORDAPI: a system for stream-based reconfigurable computing , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[6]  Kunle Olukotun,et al.  A quantitative analysis of reconfigurable coprocessors for multimedia applications , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[7]  Jörg Henkel A low power hardware/software partitioning approach for core-based embedded systems , 1999, DAC '99.

[8]  P. A. Subrahmanyam,et al.  Hardware/software partitioning for multi-function systems , 1997, 1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[9]  Jonathan Rose,et al.  Application-specific customization of soft processor microarchitecture , 2006, FPGA '06.

[10]  Mahesh Annasaheb,et al.  GALDS : A Complete Framework for Designing Multiclock ASICs and SoCs , 2007 .

[11]  Dimitrios Soudris,et al.  A partitioning methodology for accelerating applications in hybrid reconfigurable platforms , 2005, Design, Automation and Test in Europe.

[12]  Frank Vahid,et al.  Profiling tools for hardware/software partitioning of embedded applications , 2003, LCTES.

[13]  Frank Vahid,et al.  A quantitative analysis of the speedup factors of FPGAs over processors , 2004, FPGA '04.

[14]  Giovanni De Micheli,et al.  Hardware-software cosynthesis for digital systems , 1993, IEEE Design & Test of Computers.

[15]  Thomas Lengauer,et al.  Combinatorial algorithms for integrated circuit layout , 1990, Applicable theory in computer science.

[16]  Petru Eles,et al.  System Level Hardware/Software Partitioning Based on Simulated Annealing and Tabu Search , 1997, Des. Autom. Embed. Syst..

[17]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[18]  Saburo Muroga,et al.  Gate arrays , 2000 .

[19]  Maya Gokhale,et al.  The NAPA adaptive processing architecture , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).