Denial-of-Service Attacks on Shared Cache in Multicore: Analysis and Prevention

In this paper we investigate the feasibility of denial-of-service (DoS) attacks on shared caches in multicore platforms. With carefully engineered attacker tasks, we are able to cause more than 300X execution time increases on a victim task running on a dedicated core on a popular embedded multicore platform, regardless of whether we partition its shared cache or not. Based on careful experimentation on real and simulated multicore platforms, we identify an internal hardware structure of a non-blocking cache, namely the cache writeback buffer, as a potential target of shared cache DoS attacks. We propose an OS-level solution to prevent such DoS attacks by extending a state-of-the-art memory bandwidth regulation mechanism. We implement the proposed mechanism in Linux on a real multicore platform and show its effectiveness in protecting against cache DoS attacks.

[1]  Onur Mutlu,et al.  MISE: Providing performance predictability and improving fairness in shared main memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[2]  Gernot Heiser,et al.  A survey of microarchitectural timing attacks and countermeasures on contemporary hardware , 2016, Journal of Cryptographic Engineering.

[3]  Serge J. Belongie,et al.  SD-VBS: The San Diego Vision Benchmark Suite , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[4]  Yanick Fratantonio,et al.  Drammer: Deterministic Rowhammer Attacks on Mobile Platforms , 2016, CCS.

[5]  James H. Anderson,et al.  Attacking the one-out-of-m multicore problem by combining hardware management with mixed-criticality provisioning , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[6]  Chris Fallin,et al.  Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[7]  Hovav Shacham,et al.  Comprehensive Experimental Analyses of Automotive Attack Surfaces , 2011, USENIX Security Symposium.

[8]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[9]  Jochen Liedtke,et al.  OS-controlled cache predictability for real-time systems , 1997, Proceedings Third IEEE Real-Time Technology and Applications Symposium.

[10]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[11]  Henrik Theiling,et al.  Multi-core Interference-Sensitive WCET Analysis Leveraging Runtime Resource Capacity Enforcement , 2014, 2014 26th Euromicro Conference on Real-Time Systems.

[12]  Michael Hamburg,et al.  Meltdown , 2018, meltdownattack.com.

[13]  James H. Anderson,et al.  Outstanding Paper Award: Making Shared Caches More Predictable on Multicore Platforms , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[14]  Mikko H. Lipasti,et al.  Modern Processor Design: Fundamentals of Superscalar Processors , 2002 .

[15]  Rodolfo Pellizzoni,et al.  PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[16]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[17]  Marco Caccamo,et al.  Real-time cache management framework for multi-core architectures , 2013, 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[18]  Michael Hamburg,et al.  Spectre Attacks: Exploiting Speculative Execution , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[19]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[20]  Onur Mutlu,et al.  Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.

[21]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[22]  Rodolfo Pellizzoni,et al.  Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems , 2014, 2015 27th Euromicro Conference on Real-Time Systems.

[23]  Mor Harchol-Balter,et al.  Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[24]  Richard E. Kessler,et al.  Page placement algorithms for large real-indexed caches , 1992, TOCS.

[25]  Andrew Wolfe,et al.  Software-based cache partitioning for real-time applications , 1994 .

[26]  Francisco J. Cazorla,et al.  parMERASA -- Multi-core Execution of Parallelised Hard Real-Time Applications Supporting Analysability , 2013, 2013 Euromicro Conference on Digital System Design.

[27]  Ragunathan Rajkumar,et al.  A Coordinated Approach for Practical OS-Level Cache Management in Multi-core Real-Time Systems , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[28]  G. Edward Suh,et al.  A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[29]  Heechul Yun,et al.  DeepPicar: A Low-Cost Deep Neural Network-Based Autonomous Car , 2017, 2018 IEEE 24th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA).

[30]  Lui Sha,et al.  MemGuard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms , 2013, 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[31]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[32]  Hsien-Hsin S. Lee,et al.  Analyzing Performance Vulnerability due to Resource Denial›of›Service Attack on Chip Multiprocessors , 2007 .

[33]  Heechul Yun,et al.  Taming Non-Blocking Caches to Improve Isolation in Multicore Real-Time Systems , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[34]  Onur Mutlu,et al.  Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[35]  Rodolfo Pellizzoni,et al.  Memory Servers for Multicore Systems , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[36]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[37]  Onur Mutlu,et al.  Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.

[38]  Stefanos Kaxiras,et al.  Preventing Denial-of-Service Attacks in Shared CMP Caches , 2006, SAMOS.

[39]  Ankit Agrawal,et al.  Analysis of Dynamic Memory Bandwidth Regulation in Multi-core Real-Time Systems , 2018, 2018 IEEE Real-Time Systems Symposium (RTSS).

[40]  S. Kim,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[41]  Thomas F. Wenisch,et al.  Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution , 2018, USENIX Security Symposium.