Transparent acceleration of data dependent instructions for general purpose processors

Although transistor scaling keeps following Moore’s law, and more area is available for designers, the clock frequency and ILP rate do not present the same level of growth anymore. This way, new architectural alternatives are necessary. Reconfigurable fabric appears to be one emerging possibility: besides exploiting the parallelism among instructions, it can also accelerate sequences of data dependent ones. However, coarse grain reconfiguration wide spread usage is still withhold by the need of special tools and compilers, which clearly do not sustain the reuse of legacy code without any modification. Based on all these facts, this work proposes a new Binary Translation algorithm, implemented in hardware and working in parallel to the processor, responsible for transforming sequences of instructions at run-time to be executed on a dynamic coarse-grain reconfigurable array, tightly coupled to a traditional RISC machine. Therefore, we can take advantage of using pure combinational logic to optimize even control-flow oriented code in a totally transparent process, without any modification in the source or binary codes. Using the Simplescalar Toolset together with the embedded benchmark suite MIBench, we show performance improvements and area evaluation when comparing against traditional superscalar architectures.

[1]  Howard Falk,et al.  Decisive Aspects in the Evolution of Microprocessors , 2004, Proc. IEEE.

[2]  Erik R. Altman,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[3]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[4]  Frank Vahid,et al.  Energy Advantages of Microprocessor Platforms with On-Chip Configurable Logic , 2002, IEEE Des. Test Comput..

[5]  Scott Hauck,et al.  Reconfigurable computing: a survey of systems and software , 2002, CSUR.

[6]  K. Ebcioglu,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[7]  Scott Hauck,et al.  The Chimaera reconfigurable functional unit , 1997, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[9]  Antonio González,et al.  Trace-level reuse , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[10]  Fadi J. Kurdahi,et al.  A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture , 2001, CASES '01.

[11]  Scott Mahlke,et al.  Automatically generating custom instruction set extensions , 2002 .

[12]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[13]  Frank Vahid,et al.  Dynamic hardware/software partitioning: a first approach , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[14]  Michael Gschwind,et al.  Dynamic and Transparent Binary Translation , 2000, Computer.

[15]  Luigi Carro,et al.  Dynamic reconfiguration with binary translation: breaking the ILP barrier with software compatibility , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[16]  Jean-Luc Gaudiot,et al.  SMT Layout Overhead and Scalability , 2002, IEEE Trans. Parallel Distributed Syst..