ReCycle:: pipeline adaptation to tolerate process variation

Process variation affects processor pipelines by making some stages slower and others faster, therefore exacerbating pipeline unbalance. This reduces the frequency attainable by the pipeline. To improve performance, this paper proposes ReCycle, an architectural framework that comprehensively applies cycle time stealing to the pipeline - transferring the time slack of the faster stages to the slow ones by skewing clock arrival times to latching elements after fabrication. As a result, the pipeline can be clocked with a period equal to the average stage delay rather than the longest one. In addition, ReCycle's frequency gains are enhanced with Donor stages, which are empty stages added to "donate" slack to the slow stages. Finally, ReCycle can also convert slack into power reductions. For a 17FO4 pipeline, ReCycle increases the frequency by 12% and the application performance by 9% on average. Combining ReCycle and donor stages delivers improvements of 36% in frequency and 15% in performance onaverage, completely reclaiming the performance losses due to variation.

[1]  David Blaauw,et al.  Statistical Analysis and Optimization for VLSI: Timing and Power , 2005, Series on Integrated Circuits and Systems.

[2]  David H. Albonesi Dynamic IPC/clock rate optimization , 1998, ISCA.

[3]  Eric Sprangle,et al.  Increasing processor performance by implementing deeper pipelines , 2002, ISCA.

[4]  Zeshan Chishti,et al.  Wire delay is not a problem for SMT (in the near future) , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[5]  T. Knight,et al.  Automatic impedance control , 1993, 1993 IEEE International Solid-State Circuits Conference Digest of Technical Papers.

[6]  Jens Vygen,et al.  Maximum mean weight cycle in a digraph and minimizing cycle time of a logic chip 3 , 2001 .

[7]  James D. Meindl,et al.  Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration , 2002, IEEE J. Solid State Circuits.

[8]  ChenTom,et al.  Comparison of adaptive body bias (ABB) and adaptive supply voltage (ASV) for improving delay and leakage under the presence of process variation , 2003 .

[9]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[10]  Costas J. Spanos,et al.  Modeling within-die spatial correlation effects for process-design co-optimization , 2005, Sixth international symposium on quality electronic design (isqed'05).

[11]  Norman P. Jouppi,et al.  The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays , 2002, ISCA.

[12]  David Blaauw,et al.  Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation , 2003, MICRO.

[13]  Borivoje Nikolic,et al.  Performance of Deeply-Scaled, Power-Constrained Circuits , 2003 .

[14]  Sachin S. Sapatnekar,et al.  Clock Skew Optimization , 1999 .

[15]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[16]  Ishiuchi,et al.  Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas , 2004 .

[17]  Luca Benini,et al.  Dynamic Thermal Clock Skew Compensation using Tunable Delay Buffers , 2006, ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design.

[18]  Pawan Kapur,et al.  Power estimation in global interconnects and its reduction using a novel repeater optimization methodology , 2002, DAC '02.

[19]  R. Allmon,et al.  High-performance microprocessor design , 1998, IEEE J. Solid State Circuits.

[20]  Dragan Maksimovic,et al.  Closed-loop adaptive voltage scaling controller for standard-cell ASICs , 2002, ISLPED '02.

[21]  David M. Brooks,et al.  Mitigating the Impact of Process Variations on Processor Register Files and Execution Units , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[22]  David Blaauw,et al.  Reducing pipeline energy demands with local DVS and dynamic retiming , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[23]  Joel S. Emer,et al.  Loose loops sink chips , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[24]  Kevin Skadron,et al.  Impact of Parameter Variations on Multi-Core Chips , 2006 .

[25]  Aristides Efthymiou,et al.  Adaptive pipeline depth control for processor power-management , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[26]  M. Shoji,et al.  Elimination of process-dependent clock skew in CMOS VLSI , 1986 .

[27]  Jinjun Xiong,et al.  Robust Extraction of Spatial Correlation , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[28]  Vivek De,et al.  Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[29]  Eric Rotenberg,et al.  A case for dynamic pipeline scaling , 2002, CASES '02.

[30]  Ming Zhang,et al.  Circuit Failure Prediction and Its Application to Transistor Aging , 2007, 25th IEEE VLSI Test Symposium (VTS'07).

[31]  Christopher M. Durham,et al.  High Speed CMOS Design Styles , 1998 .

[32]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[33]  Ji Zhang,et al.  Itanium processor clock design , 2000, ISPD '00.

[34]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[35]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[36]  T. Chen,et al.  Comparison of adaptive body bias (ABB) and adaptive supply voltage (ASV) for improving delay and leakage under the presence of process variation , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[37]  V. De,et al.  Statistical design for variation tolerance: key to continued Moore's law , 2004, 2004 International Conference on Integrated Circuit Design and Technology (IEEE Cat. No.04EX866).

[38]  Hai Zhou,et al.  Yield-Aware Cache Architectures , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[39]  Mauro Olivieri,et al.  Design of synchronous and asynchronous variable-latency pipelined multipliers , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[40]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[41]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[42]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[43]  Allan Hartstein,et al.  The optimum pipeline depth for a microprocessor , 2002, ISCA.