Post-Fabrication Microarchitecture

Microarchitectural enhancements that improve performance generally, across many workloads, are favored in superscalar processor design. Targeting general performance is necessary but it also constrains some microarchitecture innovation. We explore relieving this constraint, via a new paradigm called Post-Fabrication Microarchitecture (PFM). A high-performance superscalar core is coupled with a reconfigurable logic fabric, RF. A programmable interface, or Agent, allows for RF to observe and microarchitecturally intervene at key pipeline stages of the superscalar core. New microarchitectural components, specific to applications, are synthesized on-demand to RF. All instructions still flow through the superscalar pipeline, as usual, but their execution is streamlined (better instructions per cycle (IPC)) through microarchitectural intervention by RF. Our research shows that one can achieve large speedups of individual applications, by analyzing their bottlenecks and providing customized microarchitectural solutions to target these bottlenecks. Examples of PFM use-cases explored in this paper include custom branch predictors and data prefetchers.

[1]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[2]  David H. Albonesi,et al.  Shared reconfigurable architectures for CMPS , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[3]  Karthikeyan Sankaralingam,et al.  Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[4]  Ralph Wittig,et al.  OneChip: an FPGA processor with reconfigurable logic , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[5]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Christopher Hughes,et al.  Speculative precomputation: long-range prefetching of delinquent loads , 2001, ISCA 2001.

[7]  Yunsup Lee,et al.  The RISC-V Instruction Set Manual , 2014 .

[8]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[9]  Eric Rotenberg,et al.  A study of slipstream processors , 2000, MICRO 33.

[10]  David H. Albonesi,et al.  ReMAP: A Reconfigurable Architecture for Chip Multiprocessors , 2011, IEEE Micro.

[11]  Seth H. Pugsley,et al.  Efficiently prefetching complex address patterns , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Eric Rotenberg,et al.  Post-Silicon Microarchitecture , 2020, IEEE Computer Architecture Letters.

[13]  Michael D. Smith,et al.  A high-performance microarchitecture with hardware-programmable functional units , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[14]  Harvey F. Silverman,et al.  Processor reconfiguration through instruction-set metamorphosis , 1993, Computer.

[15]  Andreas Moshovos,et al.  CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit , 2000, ISCA '00.

[16]  Eric Rotenberg,et al.  Slipstream Processors Revisited: Exploiting Branch Sets , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[17]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .

[18]  Eric Rotenberg,et al.  EXACT: explicit dynamic-branch prediction with active updates , 2010, CF '10.

[19]  Michael C. Huang,et al.  A performance-correctness explicitly-decoupled architecture , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[20]  André Seznec,et al.  TAGE-SC-L Branch Predictors Again , 2016 .

[21]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[22]  David A. Patterson,et al.  The GAP Benchmark Suite , 2015, ArXiv.

[23]  Craig Zilles,et al.  Execution-based prediction using speculative slices , 2001, ISCA 2001.

[24]  Amin Ansari,et al.  Bundled execution of recurring traces for energy-efficient general purpose processing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).