Data cache locking for tight timing calculations

Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data locality. In addition, caches are a source of unpredictability, resulting in programs sometimes behaving in a different way than expected. Detailed information about the number of cache misses and their causes allows us to predict cache behavior and to detect bottlenecks. Small modifications in the source code may change memory patterns, thereby altering the cache behavior. Code transformations, which take the cache behavior into account, might result in a high cache performance improvement. However, cache memory behavior is very hard to predict, thus making the task of optimizing and timing cache behavior very difficult. This article proposes and evaluates a new compiler framework that times cache behavior for multitasking systems. Our method explores the use of cache partitioning and dynamic cache locking to provide worst-case performance estimates in a safe and tight way for multitasking systems. We use cache partitioning, which divides the cache among tasks to eliminate intertask cache interferences. We combine static cache analysis and cache-locking mechanisms to ensure that all intratask conflicts, and consequently, memory access times, are exactly predictable. The results of our experiments demonstrate the capability of our framework to describe cache behavior at compile time. We compare our timing approach with a system equipped with a nonpartitioned, but statically, locked data cache. Our method outperforms static cache locking for all analyzed task sets under various cache architectures, demonstrating that our fully predictable scheme does not compromise the performance of the transformed programs.

[1]  Josep Llosa,et al.  A fast and accurate framework to analyze and optimize cache memory behavior , 2004, TOPL.

[2]  Sang Lyul Min,et al.  An accurate worst case timing analysis technique for RISC processors , 1994, 1994 Proceedings Real-Time Systems Symposium.

[3]  Sang Lyul Min,et al.  Efficient worst case timing analysis of data caching , 1996, Proceedings Real-Time Technology and Applications.

[4]  Jay K. Strosnider,et al.  Engineering and Analysis of Fixed Priority Schedulers , 1993, IEEE Trans. Software Eng..

[5]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[6]  Per Stenström,et al.  Integrating Path and Timing Analysis Using Instruction-Level Simulation Techniques , 1998, LCTES.

[7]  David B. Whalley,et al.  Bounding worst-case instruction cache performance , 1994, 1994 Proceedings Real-Time Systems Symposium.

[8]  Alan Burns,et al.  Effective Analysis for Engineering Real-Time Fixed Priority Schedulers , 1995, IEEE Trans. Software Eng..

[9]  Björn Lisper,et al.  Data caches in multitasking hard real-time systems , 2003, RTSS 2003. 24th IEEE Real-Time Systems Symposium, 2003.

[10]  Jan Gustafsson,et al.  Deriving Annotations for Tight Calculation of Execution Time , 1997, Euro-Par.

[11]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[12]  Jingling Xue,et al.  Reuse-Driven Tiling for Improving Data Locality , 1998, International Journal of Parallel Programming.

[13]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[14]  Reinhard Wilhelm,et al.  Cache Behavior Prediction by Abstract Interpretation , 1996, SAS.

[15]  Per Stenström,et al.  Timing anomalies in dynamically scheduled microprocessors , 1999, Proceedings 20th IEEE Real-Time Systems Symposium (Cat. No.99CB37054).

[16]  Isabelle Puaut,et al.  Low-complexity algorithms for static cache locking in multitasking hard real-time systems , 2002, 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002..

[17]  Paul Feautrier,et al.  Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[18]  Alan Burns,et al.  An extendible approach for analyzing fixed priority hard real-time tasks , 1994, Real-Time Systems.

[19]  Andy J. Wellings,et al.  Hybrid instruction cache partitioning for preemptive real-time systems , 1997, Proceedings Ninth Euromicro Workshop on Real Time Systems.

[20]  D. B. Kirk,et al.  SMART (strategic memory allocation for real-time) cache design , 1989, [1989] Proceedings. Real-Time Systems Symposium.

[21]  Sharad Malik,et al.  Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.

[22]  Andrew Wolfe,et al.  Software-based cache partitioning for real-time applications , 1994 .

[23]  Josep Llosa,et al.  Optimizing program locality through CMEs and GAs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[24]  Reinhard Wilhelm,et al.  Cache Behavior Prediction by Abstract Interpretation , 1996, Sci. Comput. Program..

[25]  M. Campoy,et al.  Static Use of Locking Caches in Multitask Preemptive Real-Time Systems , 2001 .

[26]  Mateo Valero,et al.  Static locality analysis for cache management , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.

[27]  Sharad Malik,et al.  Cache modeling and path analysis for real-time software , 1996 .

[28]  Andy J. Wellings,et al.  Adding instruction cache effect to schedulability analysis of preemptive real-time systems , 1996, Proceedings Real-Time Technology and Applications.

[29]  Jan Gustafsson Analyzing execution-time of object-oriented programs using abstract interpretation , 2000 .

[30]  Alan Burns,et al.  The Impact of an Ada Run-Time System's Performance Characteristics on Scheduling Models , 1993, Ada-Europe.

[31]  Kevin Jeffay,et al.  Accounting for interrupt handling costs in dynamic priority task systems , 1993, 1993 Proceedings Real-Time Systems Symposium.

[32]  Per Stenström,et al.  A method to improve the estimated worst-case performance of data caching , 1999, Proceedings Sixth International Conference on Real-Time Computing Systems and Applications. RTCSA'99 (Cat. No.PR00306).

[33]  David B. Whalley,et al.  Integrating the timing analysis of pipelining and instruction caching , 1995, Proceedings 16th IEEE Real-Time Systems Symposium.

[34]  Jingling Xue,et al.  Let's study whole-program cache behaviour analytically , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[35]  Jingling Xue,et al.  Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.

[36]  Mathai Joseph,et al.  Finding Response Times in a Real-Time System , 1986, Comput. J..

[37]  Kelvin D. Nilsen,et al.  Cache Issues in Real-Time Systems , 1994 .

[38]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[39]  Sharad Malik,et al.  Efficient microarchitecture modeling and path analysis for real-time software , 1995, Proceedings 16th IEEE Real-Time Systems Symposium.

[40]  Monica S. Lam,et al.  Efficient context-sensitive pointer analysis for C programs , 1995, PLDI '95.

[41]  Jakob Engblom,et al.  Pipeline timing analysis using a trace-driven simulator , 1999, Proceedings Sixth International Conference on Real-Time Computing Systems and Applications. RTCSA'99 (Cat. No.PR00306).

[42]  Frank Mueller,et al.  Compiler support for software-based cache partitioning , 1995, Workshop on Languages, Compilers, & Tools for Real-Time Systems.

[43]  Jakob Engblom,et al.  Modeling complex flows for worst-case execution time analysis , 2000, Proceedings 21st IEEE Real-Time Systems Symposium.

[44]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[45]  Björn Lisper,et al.  Data cache locking for higher program predictability , 2003, SIGMETRICS '03.

[46]  Sang Lyul Min,et al.  Analysis of cache-related preemption delay in fixed-priority preemptive scheduling , 1998, 17th IEEE Real-Time Systems Symposium.

[47]  Jochen Liedtke,et al.  OS-controlled cache predictability for real-time systems , 1997, Proceedings Third IEEE Real-Time Technology and Applications Symposium.

[48]  A. Marti Campoy,et al.  DYNAMIC USE OF LOCKING CACHES IN MULTITASK, PREEMPTIVE REAL-TIME SYSTEMS , 2002 .

[49]  Reinhard Wilhelm,et al.  Efficient and Precise Cache Behavior Prediction for Real-Time Systems , 1999, Real-Time Systems.

[50]  Fernando Gustavo Tinetti,et al.  Computer Architecture: A Quantitative Approach J. L. Hennessy, D. A. Patterson Morgan Kaufman, 4th Edition, 2007 , 2008 .

[51]  Jingling Xue,et al.  Reuse-Driven Tiling for Data Locality , 1997, LCPC.

[52]  W. Tindell AN EXTENDIBLE APPROACH FOR ANALYSING FIXED PRIORITY HARD REAL-TIME TASKS , 1994 .

[53]  David B. Whalley,et al.  Timing analysis for data caches and set-associative caches , 1997, Proceedings Third IEEE Real-Time Technology and Applications Symposium.