Bounding Worst-Case DRAM Performance on Multicore Processors

Bounding the worst-case DRAM performance for a real-time application is a challenging problem that is critical for computing worst-case execution time (WCET), especially for multicore processors, where the DRAM memory is usually shared by all of the cores. Typically, DRAM commands from consecutive DRAM accesses can be pipelined on DRAM devices according to the spatial locality of the data fetched by them. By considering the effect of DRAM command pipelining, we propose a basic approach to bounding the worst-case DRAM performance. An enhanced approach is proposed to reduce the overestimation from the invalid DRAM access sequences by checking the timing order of the co-running applications on a dual-core processor. Compared with the conservative approach, which assumes that no DRAM command pipelining exists, our experimental results show that the basic approach can bound the WCET more tightly, by 15.73% on average. The experimental results also indicate that the enhanced approach can further improve the tightness of WCET by 4.23% on average as compared to the basic approach.

[1]  Jan Gustafsson,et al.  The Mälardalen WCET Benchmarks: Past, Present And Future , 2010, WCET.

[2]  Hyojin Choi,et al.  Memory access pattern-aware DRAM performance model for multi-core systems , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[3]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[4]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[5]  Pascal Sainrat,et al.  Accurate analysis of memory latencies for WCET estimation , 2008 .

[6]  Donald A. Calahan,et al.  Models of Access Delays in Multiprocessor Memories , 1992, IEEE Trans. Parallel Distributed Syst..

[7]  Tor M. Aamodt,et al.  A Hybrid Analytical DRAM Performance Model , 2011 .

[8]  Wei Zhang,et al.  WCET Analysis for Multi-Core Processors with Shared L2 Instruction Caches , 2008, 2008 IEEE Real-Time and Embedded Technology and Applications Symposium.

[9]  Damien Hardy,et al.  WCET Analysis of Multi-level Non-inclusive Set-Associative Instruction Caches , 2008, 2008 Real-Time Systems Symposium.

[10]  James C. Tiernan,et al.  An efficient search algorithm to find the elementary circuits of a graph , 1970, CACM.

[11]  Greger Ottosson,et al.  Worst-case execution time analysis for modern hardware architectures , 1997 .

[12]  Jakob Engblom,et al.  Efficient longest executable path search for programs with complex flows and pipeline effects , 2001, CASES '01.

[13]  Wei Zhang,et al.  Multicore Real-Time Scheduling to Reduce Inter-Thread Cache Interferences , 2013, J. Comput. Sci. Eng..

[14]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[15]  Sharad Malik,et al.  Cache modeling for real-time software: beyond direct mapped instruction caches , 1996, 17th IEEE Real-Time Systems Symposium.

[16]  Onur Mutlu,et al.  Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.

[17]  Sharad Malik,et al.  Performance Analysis of Embedded Software Using Implicit Path Enumeration , 1995, 32nd Design Automation Conference.

[18]  Gerard J. M. Smit,et al.  A mathematical approach towards hardware design , 2010, Dynamically Reconfigurable Architectures.

[19]  Jung Ho Ahn,et al.  The Design Space of Data-Parallel Memory Systems , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[20]  서정연,et al.  Journal of Computing Science and Engineering(JCSE)의 국제화 작업 , 2010 .

[21]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[22]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[23]  David B. Whalley,et al.  Integrating the timing analysis of pipelining and instruction caching , 1995, Proceedings 16th IEEE Real-Time Systems Symposium.