Dynamic coprocessor management for FPGA-enhanced compute platforms

Various commercial programmable compute platforms have their processor architecture enhanced with field-programmable gate arrays (FPGAs). In a common usage scenario, an application loads custom processors into the FPGA to speed up application execution compared to processor-only execution. Transient applications, changing application workloads, and limited FPGA capacity have led to a new problem of operating-system-controlled dynamic management of the loading of coprocessors into the FPGAs for best overall performance or energy. We define the Dynamic Coprocessor Management problem and provide a mapping to an online optimization problem known as Metrical Task Systems. We introduce a robust heuristic, called the fading cumulative benefit (FCBenefit) heuristic, that outperforms other heuristics, including a previously developed one for MTS. For two distinct application sets, we generate numerous workloads and show that the FCBenefit heuristic provides best results across all considered workloads. In our simulations, the heuristic's results were within 9% of the offline optimal for performance, and within 3% for energy. The heuristic may be applicable to a wide variety of dynamic architecture management problems.

[1]  Andrew Tomkins,et al.  A polylog(n)-competitive algorithm for metrical task systems , 1997, STOC '97.

[2]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[3]  Luca Benini,et al.  Low-power task scheduling for multiple devices , 2000, CODES '00.

[4]  John W. Lockwood,et al.  Dynamic hardware plugins in an FPGA with partial run-time reconfiguration , 2002, DAC '02.

[5]  Scott Hauck,et al.  The Chimaera reconfigurable functional unit , 1997, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Scott Hauck,et al.  Configuration prefetch for single context reconfigurable coprocessors , 1998, FPGA '98.

[7]  Zhiyuan Li,et al.  Configuration relocation and defragmentation for run-time reconfigurable computing , 2002, IEEE Trans. Very Large Scale Integr. Syst..

[8]  Dominique Lavenier,et al.  Evaluation of the streams-C C-to-FPGA compiler: an applications perspective , 2001, FPGA '01.

[9]  Frank Vahid,et al.  Energy Advantages of Microprocessor Platforms with On-Chip Configurable Logic , 2002, IEEE Des. Test Comput..

[10]  L. Benini,et al.  Low-power task scheduling for multiple devices , 2000, Proceedings of the Eighth International Workshop on Hardware/Software Codesign. CODES 2000 (IEEE Cat. No.00TH8518).

[11]  Allan Borodin,et al.  An optimal on-line algorithm for metrical task system , 1992, JACM.

[12]  E TaylorDavid,et al.  Dynamic hardware plugins , 2002 .

[13]  Juanjo Noguera,et al.  Dynamic run-time HW/SW scheduling techniques for reconfigurable architectures , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[14]  Frank Vahid,et al.  Warp Processing: Dynamic Translation of Binaries to FPGA Circuits , 2008, Computer.

[15]  Kazuo Iwama,et al.  Average-Case Competitive Analyses for Ski-Rental Problems , 2002, Algorithmica.

[16]  Avrim Blum,et al.  On-line Learning and the Metrical Task System Problem , 1997, COLT '97.

[17]  Luciano Lavagno,et al.  Scheduling for Embedded Real-Time Systems , 1998, IEEE Des. Test Comput..

[18]  Victor Lee,et al.  The RAW benchmark suite: computation structures for general purpose computing , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[19]  Domingo Benitez Performance of remote FPGA-based coprocessors for image-processing applications , 2002, Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools.