Server-side coprocessor updating for mobile devices with FPGAs

FPGAs are increasingly used to implement coprocessors for applications running on desktop platforms, and soon such FPGA coprocessing may appear in mobile devices. Because one device may run different applications from another device, different coprocessor sets are needed for each device based on the device's usage. We introduce an approach wherein a device profiles application usage and uploads that information to a server when docked. The server then determines the best coprocessor set based on such usage and on the device's particular FPGA constraints. The server creates the coprocessor set by combining pre-synthesized coprocessors for each application, and considers multiple versions of the same coprocessor, versions that tradeoff speed and size. We introduce a coprocessor set selection problem and propose a Pareto-optimal merge heuristic for the server that yields near-optimal solutions with linear time complexity. We also use a method that avoids time-consuming resynthesis of the coprocessors into a single FPGA binary, by using small reconfigurable regions with reserved inter-region communication channels. Our experiments show that the Pareto-optimal merge heuristic generates results within 1% of the optimal on average and run 5-20x faster than simulated annealing. The experiments also show that a 3x speedup and 70% energy reduction can be achieved by using FPGA coprocessors versus running the applications only on a microprocessor.

[1]  Nikil D. Dutt,et al.  Efficient search space exploration for HW-SW partitioning , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..

[2]  Christian Haubelt,et al.  System design for flexibility , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[3]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[4]  John W. Lockwood,et al.  Dynamic hardware plugins in an FPGA with partial run-time reconfiguration , 2002, DAC '02.

[5]  Scott Hauck,et al.  The Chimaera reconfigurable functional unit , 1997, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[7]  Ranga Vemuri,et al.  An Iterative Algorithm for Hardware-Software Partitioning, Hardware Design Space Exploration and Scheduling , 2000, Des. Autom. Embed. Syst..

[8]  Ranga Vemuri,et al.  Hardware software partitioning with integrated hardware design space exploration , 1998, Proceedings Design, Automation and Test in Europe.

[9]  Jürgen Becker,et al.  New 2-dimensional partial dynamic reconfiguration techniques for real-time adaptive microelectronic circuits , 2006, IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).

[10]  Petru Eles,et al.  System Level Hardware/Software Partitioning Based on Simulated Annealing and Tabu Search , 1997, Des. Autom. Embed. Syst..

[11]  Mhand Hifi,et al.  Reduction strategies and exact algorithms for the disjunctively constrained knapsack problem , 2007, Comput. Oper. Res..

[12]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[13]  Scott McMillan,et al.  Partial Run-Time Reconfiguration Using JRTR , 2000, FPL.

[14]  Frank Vahid,et al.  A binary-constraint search algorithm for minimizing hardware during hardware/software partitioning , 1994, EURO-DAC '94.

[15]  Muhammad Shafique,et al.  Run-time instruction set selection in a transmutable embedded processor , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[16]  Zonghua Gu,et al.  An Efficient Algorithm for Online Management of 2D Area of Partially Reconfigurable FPGAs , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[17]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[18]  Takeo Yamada,et al.  Heuristic and Exact Algorithms for the Disjunctively Constrained Knapsack Problem , 2002 .

[19]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[20]  Jan Madsen,et al.  PACE: A Dynamic Programming Algorithm for Hardware/Software Partitioning , 1996, CODES.

[21]  Juanjo Noguera,et al.  A HW/SW partitioning algorithm for dynamically reconfigurable architectures , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[22]  Ranga Vemuri,et al.  An efficient algorithm for finding empty space for online FPGA placement , 2004, Proceedings. 41st Design Automation Conference, 2004..

[23]  Carl Ebeling,et al.  RaPiD - Reconfigurable Pipelined Datapath , 1996, FPL.