WCET Analysis with MRU Caches: Challenging LRU for Predictability

Most previous work in cache analysis for WCET estimation assumes a particular replacement policy called LRU. In contrast, much less work has been done for non-LRU policies, since they are generally considered to be very "unpredictable". However, most commercial processors are actually equipped with these non-LRU policies, since they are more efficient in terms of hardware cost, power consumption and thermal output, but still maintaining almost as good average-case performance as LRU. In this work, we study the analysis of MRU, a non-LRU replacement policy employed in mainstream processor architectures like Intel Nehalem. Our work shows that the predictability of MRU has been significantly underestimated before, mainly because the existing cache analysis techniques and metrics, originally designed for LRU, do not match MRU well. As our main technical contribution, we propose a new cache hit/miss classification, k-Miss, to better capture the MRU behavior, and develop formal conditions and efficient techniques to decide the k-Miss memory accesses. A remarkable feature of our analysis is that the k-Miss classifications under MRU are derived by the analysis result of the same program under LRU. Therefore, our approach inherits all the advantages in efficiency, precision and composability of the state-of-the-art LRU analysis techniques based on abstract interpretation. Experiments with benchmarks show that the estimated WCET by our proposed MRU analysis is rather close to (5% # 20% more than) that obtained by the state-of-the-art LRU analysis, which indicates that MRU is also a good candidate for the cache replacement policy in real-time systems.

[1]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[2]  Sebastian Altmeyer,et al.  Resilience analysis: tightening the CRPD bound for set-associative caches , 2010, LCTES '10.

[3]  Reinhard Wilhelm,et al.  The influence of processor architecture on the design and the results of WCET tools , 2003, Proceedings of the IEEE.

[4]  Jan Gustafsson,et al.  The Mälardalen WCET Benchmarks: Past, Present And Future , 2010, WCET.

[5]  Hridesh Rajan,et al.  A More Precise Abstract Domain for Multi-level Caches for Tighter WCET Analysis , 2010, 2010 31st IEEE Real-Time Systems Symposium.

[6]  Yun Liang,et al.  Timing analysis of concurrent programs running on shared cache multi-cores , 2009, 2009 30th IEEE Real-Time Systems Symposium.

[7]  Rolf Ernst,et al.  Scalable precision cache analysis for real-time software , 2007, TECS.

[8]  Jan Reineke,et al.  Relative competitive analysis of cache replacement policies , 2008, LCTES '08.

[9]  Jan Reineke,et al.  Memory Hierarchies, Pipelines, and Buses for Future Architectures in Time-Critical Embedded Systems , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  Aleksandar Milenkovic,et al.  Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite , 2004, ACM-SE 42.

[11]  Jan Reineke,et al.  Timing predictability of cache replacement policies , 2007, Real-Time Systems.

[12]  Christoph Cullmann,et al.  Cache persistence analysis: a novel approachtheory and practice , 2011, LCTES '11.

[13]  Frank Müller,et al.  Timing Analysis for Instruction Caches , 2000, Real-Time Systems.

[14]  David Eklov,et al.  Cache Pirating: Measuring the Curse of the Shared Cache , 2011, 2011 International Conference on Parallel Processing.

[15]  Xianfeng Li,et al.  Chronos: A timing analyzer for embedded software , 2007, Sci. Comput. Program..

[16]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .

[17]  Alan Burns,et al.  Guest Editorial: A Review of Worst-Case Execution-Time Analysis , 2000, Real-Time Systems.

[18]  Damien Hardy,et al.  WCET analysis of multi-level set-associative instruction caches , 2008, ArXiv.

[19]  Abhik Roychoudhury,et al.  Scope-Aware Data Cache Analysis for WCET Estimation , 2011, 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium.

[20]  Sharad Malik,et al.  Performance Analysis of Embedded Software Using Implicit Path Enumeration , 1995, 32nd Design Automation Conference.

[21]  Jan Reineke,et al.  Caches in WCET Analysis: Predictability - Competitiveness - Sensitivity , 2008 .

[22]  Jan Reineke,et al.  Toward Precise PLRU Cache Analysis , 2010, WCET.

[23]  Gerard J. M. Smit,et al.  A mathematical approach towards hardware design , 2010, Dynamically Reconfigurable Architectures.

[24]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[25]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[26]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[27]  Jan Reineke,et al.  Precise and Efficient FIFO-Replacement Analysis Based on Static Phase Detection , 2010, 2010 22nd Euromicro Conference on Real-Time Systems.

[28]  AbsInt Angewandte,et al.  Fast and Precise WCET Prediction by Separated Cache and Path Analyses , 1999 .

[29]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[30]  Andrew S. Tanenbaum,et al.  Modern Operating Systems , 1992 .

[31]  Henrik Theiling,et al.  Control flow graphs for real-time systems analysis: reconstruction from binary executables and usage in ILP-based path analysis , 2002 .

[32]  Jan Reineke,et al.  Abstract Interpretation of FIFO Replacement , 2009, SAS.

[33]  Reinhard Wilhelm,et al.  On Predicting Data Cache Behavior for Real-Time Systems , 1998, LCTES.

[34]  Frances E. Allen,et al.  Control-flow analysis , 2022 .

[35]  Jan Reineke,et al.  Sensitivity of cache replacement policies , 2013, ACM Trans. Embed. Comput. Syst..

[36]  Sharad Malik,et al.  Cache modeling for real-time software: beyond direct mapped instruction caches , 1996, 17th IEEE Real-Time Systems Symposium.

[37]  Clément Ballabriga,et al.  Improving the First-Miss Computation in Set-Associative Instruction Caches , 2008, 2008 Euromicro Conference on Real-Time Systems.

[38]  Tulika Mitra,et al.  Modeling shared cache and bus in multi-cores for timing analysis , 2010, SCOPES.

[39]  Y. N. Srikant,et al.  WCET estimation for executables in the presence of data caches , 2007, EMSOFT '07.

[40]  Wang Yi,et al.  WCET Analysis with MRU Caches: Challenging LRU for Predictability , 2012, IEEE Real-Time and Embedded Technology and Applications Symposium.

[41]  Christian Ferdinand,et al.  Cache behavior prediction for real-time systems , 1997 .

[42]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[43]  David B. Whalley,et al.  Bounding worst-case instruction cache performance , 1994, 1994 Proceedings Real-Time Systems Symposium.

[44]  Frank Mueller,et al.  Static cache simulation and its applications , 1995 .