Loop-Based Instruction Prefetching to Reduce the Worst-Case Execution Time

Estimating and optimizing worst-case execution time (WCET) is critical for hard real-time systems to ensure that different tasks can meet their respective deadlines. Recent work has shown that simple prefetching techniques such as the Next-N-Line prefetching can enhance both the average-case and worst-case performance; however, the improvement on the worst-case execution time is rather limited and inefficient. This paper studies a loop-based instruction prefetching approach, which can exploit the program control-flow information to intelligently prefetch instructions that are most likely needed. Our evaluation indicates that the loop-based instruction prefetching outperforms the Next-N-Line prefetching in both the worst-case and the average-case performance for real-time applications.

[1]  James E. Smith,et al.  Prefetching in supercomputer instruction caches , 1992, Proceedings Supercomputing '92.

[2]  Jan Reineke,et al.  Timing predictability of cache replacement policies , 2007, Real-Time Systems.

[3]  Sang Lyul Min,et al.  An accurate worst case timing analysis technique for RISC processors , 1994, 1994 Proceedings Real-Time Systems Symposium.

[4]  Dirk Grunwald,et al.  Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[5]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[6]  Ting Chen,et al.  WCET centric data allocation to scratchpad memory , 2005, 26th IEEE International Real-Time Systems Symposium (RTSS'05).

[7]  Glenn Reinman,et al.  Fetch directed instruction prefetching , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[8]  John Paul Shen,et al.  Hardware Support for Prescient Instruction Prefetch , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[9]  David B. Whalley,et al.  Bounding worst-case instruction cache performance , 1994, 1994 Proceedings Real-Time Systems Symposium.

[10]  Wei Zhang,et al.  WCET analysis of instruction caches with prefetching , 2007, LCTES '07.

[11]  Robert A. Walker,et al.  Interrupt Triggered Software Prefetching for Embedded CPU Instruction Cache , 2006, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06).

[12]  Andy J. Wellings,et al.  Adding instruction cache effect to schedulability analysis of preemptive real-time systems , 1996, Proceedings Real-Time Technology and Applications.

[13]  Chang-Gun Lee,et al.  Enhanced analysis of cache-related preemption delay in fixed-priority preemptive scheduling , 1996, Proceedings Real-Time Systems Symposium.

[14]  Chang-Gun Lee,et al.  Bounding Cache-Related Preemption Delay for Real-Time Systems , 2001, IEEE Trans. Software Eng..

[15]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[16]  Sharad Malik,et al.  Retargetable static timing analysis for embedded software , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[17]  Todd C. Mowry,et al.  Cooperative prefetching: compiler and hardware support for effective instruction prefetching in modern processors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[18]  Alan Jay Smith,et al.  Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.

[19]  Petru Eles,et al.  Bus Access Optimization for Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip , 2007, RTSS.

[20]  Trevor N. Mudge,et al.  Wrong-path instruction prefetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[21]  Wei Zhang,et al.  WCET Analysis for Multi-Core Processors with Shared L2 Instruction Caches , 2008, 2008 IEEE Real-Time and Embedded Technology and Applications Symposium.

[22]  Microsystems Sun,et al.  Jini^ Architecture Specification Version 2.0 , 2003 .

[23]  Jyh-Charn Liu,et al.  Deterministic upperbounds of the worst-case execution times of cached programs , 1994, 1994 Proceedings Real-Time Systems Symposium.

[24]  Isabelle Puaut,et al.  WCET-centric software-controlled instruction caches for hard real-time systems , 2006, 18th Euromicro Conference on Real-Time Systems (ECRTS'06).

[25]  Reinhard Wilhelm,et al.  On Predicting Data Cache Behavior for Real-Time Systems , 1998, LCTES.

[26]  Anthony Rowe,et al.  FireFly Mosaic: A Vision-Enabled Wireless Sensor Networking System , 2007, RTSS 2007.

[27]  Rolf Ernst,et al.  Worst case timing analysis of input dependent data cache behavior , 2006, 18th Euromicro Conference on Real-Time Systems (ECRTS'06).

[28]  Sang Lyul Min,et al.  Analysis of cache-related preemption delay in fixed-priority preemptive scheduling , 1998, 17th IEEE Real-Time Systems Symposium.

[29]  Sharad Malik,et al.  Cache modeling for real-time software: beyond direct mapped instruction caches , 1996, 17th IEEE Real-Time Systems Symposium.

[30]  Damien Hardy,et al.  WCET Analysis of Multi-level Non-inclusive Set-Associative Instruction Caches , 2008, 2008 Real-Time Systems Symposium.

[31]  Tulika Mitra,et al.  Accurate estimation of cache-related preemption delay , 2003, First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721).

[32]  Ann Marie Grizzaffi Maynard,et al.  Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.

[33]  Sang Lyul Min,et al.  An Accurate Worst Case Timing Analysis for RISC Processors , 1995, IEEE Trans. Software Eng..

[34]  Sharad Malik,et al.  Efficient microarchitecture modeling and path analysis for real-time software , 1995, Proceedings 16th IEEE Real-Time Systems Symposium.

[35]  Jakob Engblom,et al.  Requirements for and Design of a Processor with Predictable Timing , 2004, Design of Systems with Predictable Behaviour.

[36]  Stephan Thesing,et al.  Safe and precise WCET determination by abstract interpretation of pipeline models , 2004 .

[37]  Stephan Thesing,et al.  Pipeline Modeling for Timing Analysis , 2002, SAS.

[38]  Sang Lyul Min,et al.  A worst case timing analysis technique for instruction prefetch buffers , 1994, Microprocess. Microprogramming.

[39]  B. R. Rau,et al.  HPL-PD Architecture Specification:Version 1.1 , 2000 .

[40]  David B. Whalley,et al.  Integrating the timing analysis of pipelining and instruction caching , 1995, Proceedings 16th IEEE Real-Time Systems Symposium.

[41]  J. Torrellas,et al.  Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[42]  Hansoo Kim,et al.  Region-based Register Allocation for EPIC Architectures , 2000 .

[43]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[44]  Rolf Ernst,et al.  Multiple process execution in cache related preemption delay analysis , 2004, EMSOFT '04.

[45]  Xianfeng Li,et al.  Modeling out-of-order processors for WCET analysis , 2006, Real-Time Systems.

[46]  Frank Müller,et al.  Generalizing timing predictions to set-associative caches , 1997, Proceedings Ninth Euromicro Workshop on Real Time Systems.

[47]  Frank Mueller,et al.  Bounding worst-case data cache behavior by analytically deriving cache reference patterns , 2005, 11th IEEE Real Time and Embedded Technology and Applications Symposium.

[48]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[49]  Sharad Malik,et al.  Performance analysis of embedded software using implicit path enumeration , 1997, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[50]  Pascal Sainrat,et al.  Difficulties in Computing the WCET for Processors with Speculative Execution , 2002 .

[51]  Gary S. Tyson,et al.  Branch history guided instruction prefetching , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.