论文信息 - Mitigating Amdahl's law through EPI throttling

Mitigating Amdahl's law through EPI throttling

This paper is motivated by three recent trends in computer design. First, chip multi-processors (CMPs) with increasing numbers of CPU cores per chip are becoming common. Second, multi-threaded software that can take advantage of CMPs will soon become prevalent. Due to the nature of the algorithms, these multi-threaded programs inherently will have phases of sequential execution; Amdahl's law dictates that the speedup of such parallel programs will be limited by the sequential portion of the computation. Finally, increasing levels of on-chip integration coupled with a slowing rate of reduction in supply voltage make power consumption a first order design constraint. Given this environment, our goal is to minimize the execution times of multi-threaded programs containing nontrivial parallel and sequential phases, while keeping the CMP's total power consumption within a fixed budget. In order to mitigate the effects of Amdahl's law, in this paper we make a compelling case for varying the amount of energy expended to process instructions according to the amount of available parallelism. Using the equation, Power-Energy per instruction (EPI) * Instructions per second (IPS), we propose that during phases of limited parallelism (low IPS) the chip multi-processor will spend more EPI; similarly, during phases of higher parallelism (high IPS) the chip multi-processor will spend less EPI; in both scenarios power is fixed. We evaluate the performance benefits of an EPI throttle on an asymmetric multiprocessor (AMP) prototyped from a physical 4-way Xeon SMP server. Using a wide range of multi-threaded programs, we show a 38% wall clock speedup on an AMP compared to a standard SMP that uses the same power. We also measure the supply current on a 4-way SMP server while running the multi-threaded programs and use the measured data as input to a software simulator that implements a more flexible EPI throttle. The results from the measurement-driven simulation show performance benefits comparable to the AMP prototype. We analyze the results from both techniques, explain why and when an EPI throttle works well, and conclude with a discussion of the challenges in building practical EPI throttles.

John Paul Shen | Ed Grochowski | Murali Annavaram

[1] John Paul Shen,et al. Best of both latency and throughput , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[2] Michael Stonebraker,et al. The design of POSTGRES , 1986, SIGMOD '86.

[3] Virgílio A. F. Almeida,et al. Cost-performance analysis of heterogeneity in supercomputer architectures , 1990, Proceedings SUPERCOMPUTING '90.

[4] Thomas D. Burd,et al. Energy efficient CMOS microprocessor design , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[5] S. Borkar,et al. Dynamic-sleep transistor and body bias for active leakage power control of microprocessors , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[6] Norman P. Jouppi,et al. Processor Power Reduction Via Single-ISA , 2003 .

[7] Stephen H. Gunther,et al. Managing the Impact of Increasing Microprocessor Power Consumption , 2001 .

[8] Renato J. O. Figueiredo,et al. Impact of heterogeneity on DSM performance , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[9] Margaret Martonosi,et al. Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[10] Norman P. Jouppi,et al. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[11] Rudolf Eigenmann,et al. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.

[12] Dirk Grunwald,et al. Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[13] Margaret Martonosi,et al. Dynamic thermal management for high-performance microprocessors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[14] Michael C. Huang,et al. Dynamically Tuning Processor Resources with Adaptive Processing , 2003, Computer.

[15] Kunle Olukotun,et al. The Stanford Hydra CMP , 2000, IEEE Micro.

[16] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.

[17] Jian Li,et al. Power-Performance Implications of Thread-level Parallelism on Chip Multiprocessors , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[18] Luiz André Barroso,et al. Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).