PMCTrack: Delivering Performance Monitoring Counter Support to the OS Scheduler

Hardware performance monitoring counters (PMCs) have proven effective in characterizing application performance. Because PMCs can only be accessed directly at the OS privilege level, kernellevel tools must be developed to enable the end-user and userspace programs to access PMCs. A large body of work has demonstrated that the OS can perform effective runtime optimizations in multicore systems by leveraging performance-counter data. Special attention has been paid to optimizations in the OS scheduler. While existing performance monitoring tools greatly simplify the collection of PMC application data from userspace, they do not provide an architecture-agnostic kernel-level mechanism that is capable of exposing high-level PMC metrics to OS components, such as the scheduler. As a result, the implementation of PMC-based OS scheduling schemes is typically tied to specific processor models. To address this shortcoming we present PMCTrack, a novel tool for the Linux kernel that provides a simple architecture-independent mechanism that makes it possible for the OS scheduler to access per-thread PMC data. Despite being an OSoriented tool, PMCTrack still allows the gathering of monitoring data from userspace, enabling kernel developers to carry out the necessary offline analysis and debugging to assist them during the scheduler design process. In addition, the tool provides both the OS and the user-space PMCTrack components with other insightful metrics available in modern processors and which are not directly exposed as PMCs, such as cache occupancy or energy consumption. This information is also of great value when it comes to analyzing the potential benefits of novel scheduling policies on real systems. In this paper, we analyze different case studies that demonstrate the flexibility, simplicity and powerful features of PMCTrack.

[1]  Vanchinathan Venkataramani,et al.  Power-performance modeling on asymmetric multi-cores , 2013, 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[2]  Onur Mutlu,et al.  Utility-based acceleration of multithreaded applications on asymmetric CMPs , 2013, ISCA.

[3]  Manuel Prieto,et al.  A comprehensive scheduler for asymmetric multicore systems , 2010, EuroSys '10.

[4]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[5]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[8]  Li Zhao,et al.  QuickIA: Exploring heterogeneous architectures on real prototypes , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[9]  Tong Li,et al.  Operating system support for overlapping-ISA heterogeneous multi-core architectures , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[10]  Erik Hagersten,et al.  StatCache: a probabilistic approach to efficient and accurate data locality analysis , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[11]  Gerhard Wellein,et al.  LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[12]  J. Friedman Stochastic gradient boosting , 2002 .

[13]  John Paul Shen,et al.  Mitigating Amdahl's law through EPI throttling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[14]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[15]  Frank Bellosa,et al.  Resource-conscious scheduling for energy efficiency on multicore processors , 2010, EuroSys '10.

[16]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[17]  Tong Li,et al.  Using OS Observations to Improve Performance in Multicore Systems , 2008, IEEE Micro.

[18]  Tao Li,et al.  Informed Microarchitecture Design Space Exploration Using Workload Dynamics , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[19]  Pedro Tomás,et al.  SchedMon: A Performance and Energy Monitoring Tool for Modern Multi-cores , 2014, Euro-Par Workshops.

[20]  Avi Mendelson,et al.  Fairness and Throughput in Switch on Event Multithreading , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[21]  David A. Wood,et al.  Reuse-based online models for caches , 2013, SIGMETRICS '13.

[22]  O. Mutlu,et al.  Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems , 2010, ASPLOS XV.

[23]  Michael Stumm,et al.  RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.

[24]  Alexandra Fedorova,et al.  AKULA: A toolset for experimenting and developing thread placement algorithms on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[25]  Juan Carlos Saez,et al.  Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems , 2014, Euro-Par Workshops.

[26]  Juan Carlos Saez,et al.  ACFS: a completely fair scheduler for asymmetric single-isa multicore systems , 2015, SAC.

[27]  Stefanos Kaxiras,et al.  Green governors: A framework for Continuously Adaptive DVFS , 2011, 2011 International Green Computing Conference and Workshops.

[28]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[29]  Rami G. Melhem,et al.  Energy-Efficient Thread Assignment Optimization for Heterogeneous Multicore Systems , 2015, ACM Trans. Embed. Comput. Syst..

[30]  S Jarp,et al.  Perfmon2: a leap forward in performance monitoring , 2008 .

[31]  Daniel Mossé,et al.  Lucky Scheduling for Energy-Efficient Heterogeneous Multi-Core Systems , 2012, HotPower.

[32]  Onur Mutlu,et al.  Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems , 2012, TOCS.

[33]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[34]  Manuel Prieto,et al.  Leveraging Core Specialization via OS Scheduling to Improve Performance on Asymmetric Multicore Systems , 2012, TOCS.

[35]  Soraya Ghiasi,et al.  Scheduling for heterogeneous processors in server systems , 2005, CF '05.

[36]  Manuel Prieto,et al.  Leveraging workload diversity through OS scheduling to maximize performance on single-ISA heterogeneous multicore systems , 2011, J. Parallel Distributed Comput..

[37]  M. Horowitz,et al.  Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[38]  Patrick Crowley,et al.  Dynamic thread assignment on heterogeneous multiprocessor architectures , 2006, CF '06.

[39]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[40]  Lieven Eeckhout,et al.  Fairness-aware scheduling on single-ISA heterogeneous multi-cores , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[41]  Margaret Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[42]  Yan Solihin,et al.  An analytical model for cache replacement policy performance , 2006, SIGMETRICS '06/Performance '06.

[43]  Quan Chen,et al.  Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures , 2014, TACO.

[44]  Sally A. McKee,et al.  Real time power estimation and thread scheduling via performance counters , 2009, CARN.

[45]  Bin Li,et al.  Cross-architecture prediction based scheduling for energy efficient execution on single-ISA heterogeneous chip-multiprocessors , 2015, Microprocess. Microsystems.

[46]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[47]  Gerhard Wellein,et al.  LIKWID: Lightweight Performance Tools , 2011, CHPC.

[48]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..