Achieving uniform performance and maximizing throughput in the presence of heterogeneity

Continued scaling of process technologies is critical to sustaining improvements in processor frequencies and performance. However, shrinking process technologies exacerbates process variations — the deviation of process parameters from their target specifications. In the context of multi-core CMPs, which are implemented to feature homogeneous cores, within-die process variations result in substantially different core frequencies. Exposing such process-variation induced heterogeneity interferes with the norm of marketing chips at a single frequency. Further, application performance is undesirably dictated by the frequency of the core it is running on. To work around these challenges, a single uniform frequency, dictated by the slowest core, is currently chosen as the chip frequency sacrificing the increased performance capabilities of cores that could operate at higher frequencies. In this paper, we propose choosing the mean frequency across all cores, in lieu of the minimum frequency, as the single-frequency to use as the chip sales frequency. We examine several scheduling algorithms implemented below the O/S in hardware/firmware that guarantee minimum application performance near that of the average frequency, by masking process-variation induced heterogeneity from the end-user. We show that our Throughput-Driven Fairness (TDF) scheduling policy improves throughput by an average of 12% compared to a naive fairness scheme (round-robin) for frequency-sensitive applications. At the same time, TDF allows 98% of chips to maintain minimum performance at or above 90% of that expected at the mean frequency, providing a single uniform performance level to present for the chip.

[1]  Manoj Franklin,et al.  Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[2]  Michael L. Scott,et al.  Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[3]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[4]  Belliappa Kuttanna,et al.  A Sub-2 W Low Power IA Processor for Mobile Internet Devices in 45 nm High-k Metal Gate CMOS , 2009, IEEE Journal of Solid-State Circuits.

[5]  Liang-Teck Pang,et al.  Measurements and analysis of process variability in 90nm CMOS , 2006, 2006 8th International Conference on Solid-State and Integrated Circuit Technology Proceedings.

[6]  Margaret Martonosi,et al.  Long-term workload phases: duration predictions and applications to DVFS , 2005, IEEE Micro.

[7]  Jonathan A. Winter,et al.  Scheduling algorithms for unpredictably heterogeneous CMP architectures , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[8]  T. N. Vijaykumar,et al.  Heat-and-run: leveraging SMT and CMP to manage power density through the operating system , 2004, ASPLOS XI.

[9]  Josep Torrellas,et al.  Facelift: Hiding and slowing down aging in multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[10]  Tao Li,et al.  Informed Microarchitecture Design Space Exploration Using Workload Dynamics , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[11]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[12]  Saurabh Dighe,et al.  Within-die variation-aware dynamic-voltage-frequency scaling core mapping and thread hopping for an 80-core processor , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[13]  James D. Meindl,et al.  Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration , 2002, IEEE J. Solid State Circuits.

[14]  Shantanu Gupta,et al.  Architectural core salvaging in a multi-core processor for hard-error tolerance , 2009, ISCA '09.

[15]  Liang-Teck Pang,et al.  Impact of Layout on 90nm CMOS Process Parameter Fluctuations , 2006, 2006 Symposium on VLSI Circuits, 2006. Digest of Technical Papers..

[16]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[17]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[18]  Diana Marculescu,et al.  Variation-aware dynamic voltage/frequency scaling , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[19]  Christine A. Shoemaker,et al.  Scalable thread scheduling and global power management for heterogeneous many-core architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20]  Gu-Yeon Wei,et al.  Thread motion: fine-grained power management for multi-core systems , 2009, ISCA '09.

[21]  M TullsenDean,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .

[22]  S. Naffziger,et al.  Power and temperature control on a 90-nm Itanium family processor , 2006, IEEE Journal of Solid-State Circuits.

[23]  Kevin Skadron,et al.  Impact of Process Variations on Multicore Performance Symmetry , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[24]  J. Torrellas,et al.  VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects , 2008, IEEE Transactions on Semiconductor Manufacturing.

[25]  Rajiv Kapoor,et al.  Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[26]  Krste Asanovic,et al.  Reducing power density through activity migration , 2003, ISLPED '03.

[27]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.