Architecture for Transparent Binary Acceleration of Loops with Memory Accesses

This paper presents an extension to a hardware/software system architecture in which repetitive instruction traces, called Megablocks, Reconfigurable Processing Unit (RPU). This scheme is supported by a custom toolchain able to automatically generate a RPU tailored for the execution of one or more Megablocks detected offline. Switching between hardware and software execution is done transparently, without modifications to source code or executable binaries. Our approach has been evaluated using an architecture with a MicroBlaze General Purpose Processor (GPP) softcore. By using a memory sharing mechanism, the RPU can access the GPP's data memory, allowing the acceleration of Megablocks with load/store operations. For a set of 21 embedded benchmarks, an average speedup of 1.43× is achieved, and a potential speedup of 2.09× is predicted for an implementation using a low overhead interface for communication between GPP and RPU.

[1]  Henry S. Warren,et al.  Hacker's Delight , 2002 .

[2]  Masato Edahiro,et al.  FIDES: An advanced chip multiprocessor platform for secure next generation mobile terminals , 2008, ACM Trans. Embed. Comput. Syst..

[3]  João M. P. Cardoso,et al.  On identifying and optimizing instruction sequences for dynamic compilation , 2010, 2010 International Conference on Field-Programmable Technology.

[4]  Frank Vahid,et al.  Design and implementation of a MicroBlaze-based warp processor , 2009, TECS.

[5]  Aviral Shrivastava,et al.  Memory access optimization in compilation for coarse-grained reconfigurable architectures , 2011, TODE.

[6]  João M. P. Cardoso,et al.  Transparent Runtime Migration of Loop-Based Traces of Processor Instructions to Reconfigurable Processing Units , 2013, Int. J. Reconfigurable Comput..

[7]  Morteza Saheb Zamani,et al.  An architecture framework for an adaptive extensible processor , 2008, The Journal of Supercomputing.

[8]  Naehyuck Chang,et al.  Guest Editorial: Current Trends in Low-Power Design , 2010, TODE.

[9]  Luigi Carro,et al.  Transparent Reconfigurable Acceleration for Heterogeneous Embedded Applications , 2008, 2008 Design, Automation and Test in Europe.

[10]  Wayne H. Wolf A Decade of Hardware/Software Codesign , 2003, Computer.

[11]  Kiyoung Choi,et al.  Binary acceleration using coarse-grained reconfigurable architecture , 2010, CARN.

[12]  Scott A. Mahlke,et al.  An architecture framework for transparent instruction set customization in embedded processors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).