Optimizing Instruction Prefetching to Improve Worst-Case Performance for Real-Time Applications

While the average-case performance is important for general-purpose applications, worst-case performance is crucial for real-time systems to ensure schedulability and reliability. Recent work has shown that simple prefetching techniques such as the Next-N-Line prefetching can benefit both average-case and worst-case performance; however, the improvement on the worstcase execution time (WCET) is rather limited and inefficient. This paper presents two instruction prefetching approaches that are specially designed to enhance the worst-case performance, including the loop-based prefetching and WCET-oriented prefetching. Our experiments indicate that both instruction prefetching techniques can achieve better worst-case execution cycles than the Next-N-Line prefetching while having various impacts on the average-case performance.

[1]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[2]  David B. Whalley,et al.  Bounding worst-case instruction cache performance , 1994, 1994 Proceedings Real-Time Systems Symposium.

[3]  Sharad Malik,et al.  Efficient microarchitecture modeling and path analysis for real-time software , 1995, Proceedings 16th IEEE Real-Time Systems Symposium.

[4]  Gary S. Tyson,et al.  Branch history guided instruction prefetching , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[5]  Rolf Ernst,et al.  Worst case timing analysis of input dependent data cache behavior , 2006, 18th Euromicro Conference on Real-Time Systems (ECRTS'06).

[6]  Frank Müller,et al.  Generalizing timing predictions to set-associative caches , 1997, Proceedings Ninth Euromicro Workshop on Real Time Systems.

[7]  Sharad Malik,et al.  Performance Analysis of Embedded Software Using Implicit Path Enumeration , 1995, 32nd Design Automation Conference.

[8]  Frank Mueller,et al.  Bounding worst-case data cache behavior by analytically deriving cache reference patterns , 2005, 11th IEEE Real Time and Embedded Technology and Applications Symposium.

[9]  Glenn Reinman,et al.  Fetch directed instruction prefetching , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[10]  Microsystems Sun,et al.  Jini^ Architecture Specification Version 2.0 , 2003 .

[11]  Robert A. Walker,et al.  Interrupt Triggered Software Prefetching for Embedded CPU Instruction Cache , 2006, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06).

[12]  James E. Smith,et al.  Prefetching in supercomputer instruction caches , 1992, Proceedings Supercomputing '92.

[13]  Dirk Grunwald,et al.  Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[14]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[15]  Todd C. Mowry,et al.  Cooperative prefetching: compiler and hardware support for effective instruction prefetching in modern processors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[16]  Wei Zhang,et al.  WCET analysis of instruction caches with prefetching , 2007, LCTES '07.

[17]  Byung Suk Lee,et al.  Transformation of Continuous Aggregation Join Queries over Data Streams , 2007, J. Comput. Sci. Eng..

[18]  Reinhard Wilhelm,et al.  On Predicting Data Cache Behavior for Real-Time Systems , 1998, LCTES.

[19]  Alan Jay Smith,et al.  Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.

[20]  Josep Torrellas,et al.  Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses , 1996, ISCA.

[21]  John Paul Shen,et al.  Hardware Support for Prescient Instruction Prefetch , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[22]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[23]  Sang Lyul Min,et al.  A worst case timing analysis technique for instruction prefetch buffers , 1994, Microprocess. Microprogramming.

[24]  B. R. Rau,et al.  HPL-PD Architecture Specification:Version 1.1 , 2000 .

[25]  David B. Whalley,et al.  Integrating the timing analysis of pipelining and instruction caching , 1995, Proceedings 16th IEEE Real-Time Systems Symposium.

[26]  J. Torrellas,et al.  Instruction Prefetching of Systems Codes with Layout Optimized for Reduced Cache Misses , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[27]  Rajesh Gupta Guest editorial: Special issue on networked embedded systems , 2004, TECS.

[28]  Nikil D. Dutt,et al.  Memory data organization for improved cache performance in embedded processor applications , 1997, TODE.