Modeling critical sections in Amdahl's law and its implications for multicore design

This paper presents a fundamental law for parallel performance: it shows that parallel performance is not only limited by sequential code (as suggested by Amdahl's law) but is also fundamentally limited by synchronization through critical sections. Extending Amdahl's software model to include critical sections, we derive the surprising result that the impact of critical sections on parallel performance can be modeled as a completely sequential part and a completely parallel part. The sequential part is determined by the probability for entering a critical section and the contention probability (i.e., multiple threads wanting to enter the same critical section). This fundamental result reveals at least three important insights for multicore design. (i) Asymmetric multicore processors deliver less performance benefits relative to symmetric processors than suggested by Amdahl's law, and in some cases even worse performance. (ii) Amdahl's law suggests many tiny cores for optimum performance in asymmetric processors, however, we find that fewer but larger small cores can yield substantially better performance. (iii) Executing critical sections on the big core can yield substantial speedups, however, performance is sensitive to the accuracy of the critical section contention predictor.

[1]  Martin Hopkins,et al.  Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.

[2]  Brett H. Meyer,et al.  Amdahl’s Law Revisited for Single Chip Systems , 2007, International Journal of Parallel Programming.

[3]  David A. Wood,et al.  LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[4]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[5]  James R. Goodman,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, MICRO.

[6]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[7]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[8]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[9]  Virgílio A. F. Almeida,et al.  Cost-performance analysis of heterogeneity in supercomputer architectures , 1990, Proceedings SUPERCOMPUTING '90.

[10]  James R. Goodman,et al.  Transactional lock-free execution of lock-based programs , 2002, ASPLOS X.

[11]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[12]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[13]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[14]  Yale N. Patt,et al.  Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs , 2008, ASPLOS.

[15]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[16]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[17]  Ravi Rajwar,et al.  The impact of performance asymmetry in emerging multicore architectures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[18]  Onur Mutlu,et al.  Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[19]  John Paul Shen,et al.  Mitigating Amdahl's law through EPI throttling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[20]  Engin Ipek,et al.  Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.

[21]  Uri C. Weiser,et al.  Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors , 2006, IEEE Computer Architecture Letters.

[22]  Josep Torrellas,et al.  Speculative synchronization: applying thread-level speculation to explicitly parallel applications , 2002, ASPLOS X.

[23]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[24]  Coniferous softwood GENERAL TERMS , 2003 .