Slow and Steady: Measuring and Tuning Multicore Interference

Now ubiquitous, multicore processors provide replicated compute cores that allow independent programs to run in parallel. However, shared resources, such as last-level caches, can cause otherwise-independent programs to interfere with one another, leading to significant and unpredictable effects on their execution time. Indeed, prior work has shown that specially crafted enemy programs can cause software systems of interest to experience orders-of-magnitude slowdowns when both are run in parallel on a multicore processor. This undermines the suitability of these processors for tasks that have real-time constraints. In this work, we explore the design and evaluation of techniques for empirically testing interference using enemy programs, with an eye towards reliability (how reproducible the interference results are) and portability (how interference testing can be effective across chips). We first show that different methods of measurement yield significantly different magnitudes of, and variation in, observed interference effects when applied to an enemy process that was shown to be particularly effective in prior work. We propose a method of measurement based on percentiles and confidence intervals, and show that it provides both competitive and reproducible observations. The reliability of our measurements allows us to explore auto-tuning, where enemy programs are further specialised per architecture. We evaluate three different tuning approaches (random search, simulated annealing, and Bayesian optimisation) on five different multicore chips, spanning x86 and ARM architectures. To show that our tuned enemy programs generalise to applications, we evaluate the slowdowns caused by our approach on the AutoBench and CoreMark benchmark suites. Our method achieves a statistically larger slowdown compared to prior work in 35 out of 105 benchmarldchip combinations, with a maximum difference of $ 3.8\times$. We envision that empirical approaches, such as ours, will be valuable for ‘first pass’ evaluations when investigating which multicore processors are suitable for real-time tasks.

[1]  Alois Knoll,et al.  Timing anomalies in multi-core architectures due to the interference on the shared resources , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).

[2]  Heechul Yun,et al.  Addressing isolation challenges of non-blocking caches for multicore real-time systems , 2017, Real-Time Systems.

[3]  David Eklov,et al.  Bandwidth bandit: Understanding memory contention , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[4]  Alastair F. Donaldson,et al.  Exposing errors related to weak memory in GPU applications , 2016, PLDI.

[5]  Rodolfo Pellizzoni,et al.  PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[6]  Reinhard Wilhelm,et al.  Mixed Feelings About Mixed Criticality (Invited Paper) , 2018, Worst-Case Execution Time Analysis.

[7]  Larry M. Kinnan,et al.  Use of multicore processors in avionics systems and its potential impact on implementation and certification , 2009, 2009 IEEE/AIAA 28th Digital Avionics Systems Conference.

[8]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[9]  Tullio Vardanega,et al.  Computing Safe Contention Bounds for Multicore Resources with Round-Robin and FIFO Arbitration , 2017, IEEE Transactions on Computers.

[10]  Heechul Yun,et al.  Taming Non-Blocking Caches to Improve Isolation in Multicore Real-Time Systems , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[11]  Iain Bate,et al.  Achieving Appropriate Test Coverage for Reliable Measurement-Based Timing Analysis , 2016, 2016 28th Euromicro Conference on Real-Time Systems (ECRTS).

[12]  Joachim Wegener,et al.  Testing real-time systems using genetic algorithms , 1997, Software Quality Journal.

[13]  Srini Mandalapu White Paper on Issues Associated with Interference Applied to Multicore Processors , 2016 .

[14]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[15]  David Eklov,et al.  Cache Pirating: Measuring the Curse of the Shared Cache , 2011, 2011 International Conference on Parallel Processing.

[16]  Francisco J. Cazorla,et al.  Assessing the suitability of the NGMP multi-core processor in the space domain , 2012, EMSOFT '12.

[17]  Luis A. Escobar,et al.  Statistical Intervals: A Guide for Practitioners , 1991 .

[18]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[19]  Heechul Yun,et al.  Denial-of-Service Attacks on Shared Cache in Multicore: Analysis and Prevention , 2019, 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[20]  Zhenman Fang,et al.  Measuring Microarchitectural Details of Multi- and Many-Core Memory Systems through Microbenchmarking , 2015, ACM Trans. Archit. Code Optim..

[21]  Lui Sha,et al.  MemGuard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms , 2013, 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[22]  Michael Paulitsch,et al.  Leveraging Multi-core Computing Architectures in Avionics , 2012, 2012 Ninth European Dependable Computing Conference.

[23]  Francisco J. Cazorla,et al.  TASA: Toolchain-Agnostic Static Software randomisation for critical real-time systems , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[24]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[25]  Matthias Hauswirth,et al.  Why you should care about quantile regression , 2013, ASPLOS '13.

[26]  Robert I. Davis,et al.  Forecast-based interference: modelling multicore interference from observable factors , 2017, RTNS.

[27]  Zhao Zhang,et al.  Enabling software management for multicore caches with a lightweight hardware support , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[28]  Francisco J. Cazorla,et al.  On the evaluation of the impact of shared resources in multithreaded COTS processors in time-critical environments , 2012, TACO.