Coupling latency-insensitivity with variable-latency for better than worst case design: a RISC case study

The gap between worst and typical case delays is bound to increase in nanometer scale technologies due to the spread in process manufacturing parameters. To still profit from scaling, designs should tolerate worst case delays seamlessly and with a minimum performance degradation with respect to the typical case. We present a simple RISC core which tolerates worst case extra latency using the Latency-Insensitive Design approach coupled to a Variable-Latency mechanism. Stalls caused by excessive delay, by data and control hazards and by late memory access are dealt with in a uniform way. Compared to a pure worst-case approach, our design method permits to increase the core clock frequency by 23% in a 45 nm CMOS technology, without area and power penalty.

[1]  David Bañeres,et al.  Variable-latency design by function speculation , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[2]  Nan Jiang,et al.  A MIPS R2000 implementation , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[3]  Alberto L. Sangiovanni-Vincentelli,et al.  A methodology for correct-by-construction latency insensitive design , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).

[4]  David Blaauw,et al.  Razor II: In Situ Error Detection and Correction for PVT and SER Tolerance , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[5]  Jordi Cortadella,et al.  Synthesis of synchronous elastic architectures , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[6]  Kaushik Roy,et al.  CRISTA: A New Paradigm for Low-Power, Variation-Tolerant, and Adaptive Circuit Synthesis Using Critical Path Isolation , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  David Blaauw,et al.  Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation , 2003, MICRO.

[8]  H. T. Kung,et al.  A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.

[9]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[10]  David M. Brooks,et al.  Mitigating the Impact of Process Variations on Processor Register Files and Execution Units , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[11]  Mauro Olivieri,et al.  Design of synchronous and asynchronous variable-latency pipelined multipliers , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[12]  Kaushik Roy,et al.  Trifecta: A Nonspeculative Scheme to Exploit Common, Data-Dependent Subcritical Paths , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Pradip Bose,et al.  Synchronous interlocked pipelines , 2002, Proceedings Eighth International Symposium on Asynchronous Circuits and Systems.

[14]  Paul I. Pénzes,et al.  The design of an asynchronous MIPS R3000 microprocessor , 1997, Proceedings Seventeenth Conference on Advanced Research in VLSI.