Impact of Die-to-Die and Within-Die Parameter Variations on the Clock Frequency and Throughput of Multi-Core Processors

A statistical performance simulator is developed to explore the impact of parameter variations on the maximum clock frequency (FMAX) and throughput distributions of multi-core processors in a future 22 nm technology. The simulator captures the effects of die-to-die (D2D) and within-die (WID) transistor and interconnect parameter variations on critical path delays in a die. A key component of the simulator is an analytical multi-core processor throughput model, which enables computationally efficient and accurate throughput calculations, as compared with cycle-accurate performance simulators, for single-threaded and highly parallel multi-threaded (MT) workloads. Based on microarchitecture designs from previous microprocessors, three multi-core processors with either small, medium, or large cores are projected for the 22 nm technology generation to investigate a range of design options. These three multi-core processors are optimized for maximum throughput within a constant die area. A traditional single-core processor is also scaled to the 22 nm technology to provide a baseline comparison. The salient contributions from this paper are: 1) product-level variation analysis for multi-core processors must focus on throughput, rather than just FMAX, and 2) multi-core processors are more variation tolerant than single-core processors due to the larger impact of memory latency and bandwidth on throughput. To elucidate these two points, statistical simulations indicate that multi-core and single-core processors with an equivalent total core area have similar FMAX distributions (mean degradation of 9% and standard deviation of 5%) for MT applications. In contrast to single-core processors, memory latency and bandwidth constraints significantly limit the throughput dependency on FMAX in multi-core processors, thus reducing the throughput mean degradation and standard deviation by ~50% for the small and medium core designs and by ~30% for the large core design. This improvement in the throughput distribution indicates that multi-core processors could significantly reduce the product design and process development complexities due to parameter variations as compared to single-core processors, enabling faster time to market for high-performance microprocessor products.

[1]  Atsushi Kurokawa,et al.  Challenge: variability characterization and modeling for 65- to 90-nm processes , 2005, Proceedings of the IEEE 2005 Custom Integrated Circuits Conference, 2005..

[2]  K.A. Bowman,et al.  Maximum clock frequency distribution model with practical VLSI design considerations , 2004, 2004 International Conference on Integrated Circuit Design and Technology (IEEE Cat. No.04EX866).

[3]  R. D. Valentine,et al.  The Intel Pentium M processor: Microarchitecture and performance , 2003 .

[4]  James E. Smith,et al.  A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[5]  Vivek De,et al.  Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[6]  David A. Wood,et al.  Using compression to improve chip multiprocessor performance , 2006 .

[7]  James D. Meindl,et al.  Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration , 2002, IEEE J. Solid State Circuits.

[8]  Avner Kornfeld,et al.  Estimation of FMAX and ISB in microprocessors , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[10]  Doris Schmitt-Landsiedel,et al.  The impact of intra-die device parameter variations on path delays and on the design for yield of low voltage digital circuits , 1996, ISLPED '96.

[11]  C. Webb,et al.  A scalable X86 CPU design for 90 nm process , 2004, 2004 IEEE International Solid-State Circuits Conference (IEEE Cat. No.04CH37519).

[12]  Diana Marculescu,et al.  Characterizing chip-multiprocessor variability-tolerance , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[13]  S. B. Samaan The impact of device parameter variations on the frequency and performance of VLSI chips , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[14]  Keith A. Bowman,et al.  Impact of die-to-die and within-die parameter variations on the throughput distribution of multi-core processors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[15]  P. Bai,et al.  A 65nm logic technology featuring 35nm gate lengths, enhanced channel strain, 8 Cu interconnect layers, low-k ILD and 0.57 /spl mu/m/sup 2/ SRAM cell , 2004, IEDM Technical Digest. IEEE International Electron Devices Meeting, 2004..

[16]  S. Tam,et al.  A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[17]  Kaivalya M. Dixit,et al.  The SPEC benchmarks , 1991, Parallel Comput..

[18]  S. G. Duvall,et al.  Statistical circuit modeling and optimization , 2000, 2000 5th International Workshop on Statistical Metrology (Cat.No.00TH8489.