Variation aware cache partitioning for multithreaded programs

Multithreaded programs are commonly written and optimized for homogeneous multi-core processors assuming equal performance from all the cores. This assumption greatly simplifies the partitioning and balancing of an application's workload across threads; however, it no longer holds when the frequencies of the cores differ due to within-die variations, leading to a degradation in performance. We observe that, in addition to the frequency of the core that it executes on, the performance of a thread is also dependent on the share of shared system resources, such as last-level cache, that it receives. We propose variation-aware cache partitioning as an approach to redress the variation-induced imbalance in the execution times of threads, thereby improving the performance of multi-threaded programs. We discuss the challenges involved in realizing our proposal, including synchronization (e.g., barriers) across threads, which results in faster threads being limited by slower threads, the complex and non-linear relationship between a thread's performance and the cache capacity allocated to it, and the fact that different program phases, can respond quite differently to varying cache capacity. We propose a runtime scheme to perform spatio-temporal cache partitioning while considering both chip characteristics (frequency variations) and program characteristics. We evaluate the proposed technique by applying it to an ensemble of variation-impacted multi-cores executing multi-threaded programs from the PAR-SEC and SPEC-OMP suites, and demonstrate that it results in an average performance improvement of 15% by mitigating the impact of frequency variations.

[1]  Mahmut T. Kandemir,et al.  A helper thread based dynamic cache partitioning scheme for multithreaded applications , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[2]  Yan Solihin,et al.  Quality of service shared cache management in chip multiprocessor architecture , 2010, TACO.

[3]  James E. Smith,et al.  A performance counter architecture for computing accurate CPI components , 2006, ASPLOS XII.

[4]  Puneet Gupta,et al.  Variation-aware speed binning of multi-core processors , 2010, 2010 11th International Symposium on Quality Electronic Design (ISQED).

[5]  Margaret Martonosi,et al.  Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors , 2009, ISCA '09.

[6]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[7]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Yan Solihin,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[9]  Rudolf Eigenmann,et al.  SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.

[10]  Daniel Aarno,et al.  Full-System Simulation from Embedded to High-Performance Systems , 2010 .

[11]  Vijay S. Pai,et al.  Imbalanced cache partitioning for balanced data-parallel programs , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Diana Marculescu,et al.  Variation-aware dynamic voltage/frequency scaling , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[13]  Krishna K. Rangan,et al.  Achieving uniform performance and maximizing throughput in the presence of heterogeneity , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[14]  Saurabh Dighe,et al.  Within-Die Variation-Aware Dynamic-Voltage-Frequency-Scaling With Optimal Core Allocation and Thread Hopping for the 80-Core TeraFLOPS Processor , 2011, IEEE Journal of Solid-State Circuits.

[15]  Mahmut T. Kandemir,et al.  Intra-application cache partitioning , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[16]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[17]  J. Torrellas,et al.  VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects , 2008, IEEE Transactions on Semiconductor Manufacturing.

[18]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[19]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[20]  G. Edward Suh,et al.  Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.