Supporting runtime reconfigurable VLIWs cores through dynamic binary translation

Single ISA-Heterogeneous multi-cores such as the ARM big.LITTLE have proven to be an attractive solution to explore different energy/performance trade-offs. Such architectures combine Out of Order cores with smaller in-order ones to offer different power/energy profiles. They however do not really exploit the characteristics of workloads (compute-intensive vs. control dominated). In this work, we propose to enrich these architectures with runtime configurable VLIW cores, which are very efficient at compute-intensive kernels. To preserve the single ISA programming model, we resort to Dynamic Binary Translation, and use this technique to enable dynamic code specialization for Runtime Reconfigurable VLIWs cores. Our proposed DBT framework targets the RISC-V ISA, for which both OoO and in-order implementations exist. Our experimental results show that our approach can lead to best-case performance and energy efficiency when compared against static VLIW configurations.

[1]  Adam M. Izraelevitz,et al.  The Rocket Chip Generator , 2016 .

[2]  Giovanni Agosta,et al.  JIST: just-in-time scheduling translation for parallel processors , 2004, Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks.

[3]  Richard Johnson,et al.  The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[4]  Luigi Carro,et al.  Leveraging Compiler Support on VLIW Processors for Efficient Power Gating , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[5]  Steven Derrien,et al.  Hardware-accelerated dynamic binary translation , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[6]  David A. Patterson,et al.  The Berkeley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor , 2015 .

[7]  Richard Johnson,et al.  The Transmeta Code Morphing#8482; Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, CGO.

[8]  Gary Brown,et al.  Denver: Nvidia's First 64-bit ARM Processor , 2015, IEEE Micro.

[9]  K. Ebcioglu,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[10]  Srinivas Katkoori,et al.  A Framework for Power-Gating Functional Units in Embedded Microprocessors , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[11]  Erik R. Altman,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[12]  Santosh Pande,et al.  Optimizing Static Power Dissipation by Functional Units in Superscalar Processors , 2002, CC.

[13]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.

[14]  Benoît Dupont de Dinechin Inter-block Scoreboard Scheduling in a JIT Compiler for VLIW Processors , 2008, Euro-Par.

[15]  Stephan Wong,et al.  Support for dynamic issue width in VLIW processors using generic binaries , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).