From Instruction Traces to Specialized Reconfigurable Arrays

This paper presents an offline tool-chain which automatically extracts loops (Mega blocks) from Micro Blaze instruction traces and creates a tailored Reconfigurable Processing Unit (RPU) for those loops. The system moves loops from the CPU to the RPU transparently, at runtime, and without changing the executable binaries. The system was implemented in an FPGA and for the tested kernels measured speedups ranged between 3.9x and 18.2x for a Micro Blaze CPU without cache. We estimate speedups from 1.03x to 2.01x, when comparing to the best estimated performance achieved with a single Micro Blaze.

[1]  Frank Vahid,et al.  Warp Processors , 2004, ACM Trans. Design Autom. Electr. Syst..

[2]  João M. P. Cardoso,et al.  On Identifying Segments of Traces for Dynamic Compilation , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[3]  João M. P. Cardoso,et al.  On identifying and optimizing instruction sequences for dynamic compilation , 2010, 2010 International Conference on Field-Programmable Technology.

[4]  Frank Vahid,et al.  Design and implementation of a MicroBlaze-based warp processor , 2009, TECS.

[5]  Luigi Carro,et al.  Run-time Adaptable Architectures for Heterogeneous Behavior Embedded Systems , 2008, ARC.

[6]  Sanjay J. Patel,et al.  rePLay: A Hardware Framework for Dynamic Optimization , 2001, IEEE Trans. Computers.

[7]  Scott A. Mahlke,et al.  Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[8]  Hossein Pedram,et al.  An efficient heterogeneous reconfigurable functional unit for an adaptive dynamic extensible processor , 2007, VLSI-SoC.

[9]  Luigi Carro,et al.  Transparent Reconfigurable Acceleration for Heterogeneous Embedded Applications , 2008, 2008 Design, Automation and Test in Europe.

[10]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[11]  Scott A. Mahlke,et al.  An architecture framework for transparent instruction set customization in embedded processors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[12]  Morteza Saheb Zamani,et al.  An architecture framework for an adaptive extensible processor , 2008, The Journal of Supercomputing.