Asymmetric Chip Multiprocessors: Balancing Hardware Effic iency and Programmer Efficiency

Chip Multiprocessors are becoming common as the cost of increasing chip power begins to limit single core performance. The most power efficient CMP consists of low power in-order cores. However, performance on such a processor is low unless the workload is nearly completely parallelized, which depending on the workload can be impossible or require significant programmer effort. This paper argues that the programmer effort required to parallelize an application can be reduced if the underlying architecture promises faster execution of the serial portion of an application. In such a case, programmers can parallelize only the easier-to-parallelize portions of the application and rely on the hardware to run the serial portion faster. We make a case for an architecture which contains one high performance out-of-order processor and multiple low performance in-order processors. We call it an Asymmetric Chip Multiprocessor (ACMP). Although the out-of-order core in the ACMP makes it less power efficient, it enables the ACMP to produce higher performance gains with less programmer effort.

[1]  Bratin Saha,et al.  Enabling scalability and performance in a large scale CMP environment , 2007, EuroSys '07.

[2]  Milind Girkar,et al.  Towards efficient multi-level threading of H.264 encoder on Intel hyper-threading architectures , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[3]  Uri C. Weiser,et al.  Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors , 2006, IEEE Computer Architecture Letters.

[4]  Ryan E. Grant,et al.  Power-performance efficiency of asymmetric multiprocessors for multi-threaded scientific applications , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[5]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[6]  Engin Ipek,et al.  Accommodating Workload Diversity in Chip Multiprocessors via Adaptive Core Fusion , 2006 .

[7]  John Paul Shen,et al.  Best of both latency and throughput , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[8]  John Paul Shen,et al.  Mitigating Amdahl's law through EPI throttling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[9]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[10]  Norman P. Jouppi,et al.  Core architecture optimization for heterogeneous chip multiprocessors , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[11]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[12]  Kunle Olukotun,et al.  Maximizing CMP throughput with mediocre cores , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[13]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[14]  Norman P. Jouppi,et al.  Processor Power Reduction Via Single-ISA Heterogeneous Multi-Core Architectures , 2003, IEEE Computer Architecture Letters.

[15]  Norman P. Jouppi,et al.  Heterogeneous chip multiprocessors , 2005, Computer.

[16]  Xinmin Tian,et al.  Compiler support of the workqueuing execution model for Intel SMP architectures , 2002 .

[17]  Patrick Crowley,et al.  Dynamic thread assignment on heterogeneous multiprocessor architectures , 2006, CF '06.

[18]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[19]  Fred J. Pollack New microarchitecture challenges in the coming generations of CMOS process technologies (keynote address)(abstract only) , 1999, MICRO.

[20]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[21]  Renato J. O. Figueiredo,et al.  Impact of heterogeneity on DSM performance , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[22]  Ravi Rajwar,et al.  The impact of performance asymmetry in emerging multicore architectures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).