A Configurable TLB Hierarchy for the RISC-V Architecture

The Rocket Chip Generator uses a collection of parameterized processor components to produce RISC-V-based SoCs. It is a powerful tool that can produce a wide variety of processor designs ranging from tiny embedded processors to complex multi-core systems. In this paper we extend the features of the Memory Management Unit of the Rocket Chip Generator and specifically the TLB Hierarchy. TLBs are essential in terms of performance because they mitigate the overhead of frequent Page Table Walks, but may harm the critical path of the processor due to their size and/or associativity. In the original Rocket Chip implementation the L1 Data/Instruction TLB is fully-associative and the shared L2 TLB is direct-mapped. We lift these restrictions and design and implement configurable, set-associative L1 and L2 TLB templates that can create any organization from direct-mapped to fully-associative to achieve the desired ratio of performance and resource utilization, especially for larger TLBs. We present the area for different configurations and evaluate the overall performance of our design using the SPEC2006 benchmark suite on the Xilinx ZCU102 FPGA. Our design is intended both for ASIC implementation and for FPGA-friendly soft processors. As FPGAs continue to increase in size, it becomes increasingly attainable and desirable to use configurable high-performance soft processors that can run full-fledged operating systems, especially for applications with large memory footprints.

[1]  John Wawrzynek,et al.  Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.

[2]  Donggyu Kim,et al.  Reusability is FIRRTL ground: Hardware construction languages, compiler frameworks, and transformations , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[3]  Vaughn Betz,et al.  Quantifying the Gap Between FPGA and Custom CMOS to Aid Microarchitectural Design , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Andreas Moshovos,et al.  An Efficient Non-blocking Data Cache for Soft Processors , 2010, 2010 International Conference on Reconfigurable Computing and FPGAs.

[5]  Di Wu,et al.  Low-cost, high-performance branch predictors for soft processors , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[6]  J. Gregory Steffan,et al.  Improving Pipelined Soft Processors with Multithreading , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[7]  David A. Patterson,et al.  The Berkeley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor , 2015 .

[8]  Jonathan Bachrach,et al.  Golden Gate: Bridging The Resource-Efficiency Gap Between ASICs and FPGA Prototypes , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[9]  Lesley Shannon,et al.  TAIGA: A new RISC-V soft-processor framework enabling high performance CPU architectural features , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[10]  Andreas Moshovos,et al.  What limits the operating frequency of a soft processor design , 2014, 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14).

[11]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[12]  Andreas Moshovos,et al.  Design space exploration of instruction schedulers for out-of-order soft processors , 2010, 2010 International Conference on Field-Programmable Technology.

[13]  Vaughn Betz,et al.  Efficient methods for out-of-order load/store execution for high-performance soft processors , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[14]  Adam M. Izraelevitz,et al.  The Rocket Chip Generator , 2016 .