WCET-Aware Assembly Level Optimizations

The major shortcoming of source code optimizations is their lack of intrinsic knowledge about the underlying architecture. Hence, the development of transformations that exploit processor-specific features is limited or even infeasible at all. As a result, a maximal optimization potential can not be explored. In contrast, assembly level optimizations operate on a code representation that reflects the finally executed code. Thus, the compiler is fully aware of numerous critical details about the utilized resources during execution. In this chapter, novel WCET-aware assembly level optimizations are discussed. In detail, the optimizations WCET-aware procedure positioning and WCET-aware trace scheduling are presented.

[1]  Tulika Mitra,et al.  Exploring locking & partitioning for predictable shared caches on multi-cores , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[2]  Henrik Theiling,et al.  Compile-time decided instruction cache locking using worst-case execution paths , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[3]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[4]  Gerhard Fettweis,et al.  A new network processor architecture for high-speed communications , 1999, 1999 IEEE Workshop on Signal Processing Systems. SiPS 99. Design and Implementation (Cat. No.99TH8461).

[5]  Paul Lokuciejewski,et al.  WCET-driven Cache-based Procedure Positioning Optimizations , 2008, 2008 Euromicro Conference on Real-Time Systems.

[6]  Paul Lokuciejewski,et al.  WCET-Driven Cache-Aware Memory Content Selection , 2010, 2010 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing.

[7]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[8]  Heiko Falk,et al.  Optimal static WCET-aware scratchpad allocation of program code , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[9]  Guang R. Gao,et al.  Single-dimension software pipelining for multi-dimensional loops , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[10]  Krishna V. Palem,et al.  Scheduling Time-Critical Instructions on RISC Machines , 1993, ACM Trans. Program. Lang. Syst..

[11]  Heiko Falk,et al.  WCET-aware register allocation based on graph coloring , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[12]  Isabelle Puaut,et al.  Scratchpad memories vs locked caches in hard real-time systems: a quantitative comparison , 2007 .

[13]  Ting Chen,et al.  WCET centric data allocation to scratchpad memory , 2005, 26th IEEE International Real-Time Systems Symposium (RTSS'05).

[14]  C. Norris,et al.  A schedular-sensitive global register allocator , 1993, Supercomputing '93.

[15]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[16]  Jack W. Davidson,et al.  Profile guided code positioning , 1990, SIGP.

[17]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[18]  Reza Rooholamini,et al.  Finding the right ATM switch for the market , 1994, Computer.

[19]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[20]  Jean-François Deverge,et al.  WCET-Directed Dynamic Scratchpad Memory Allocation of Data , 2007, 19th Euromicro Conference on Real-Time Systems (ECRTS'07).

[21]  Yun Liang,et al.  Cache-aware optimization of BAN applications , 2009, Des. Autom. Embed. Syst..

[22]  Hiroyuki Tomiyama,et al.  Code placement techniques for cache miss rate reduction , 1997, TODE.

[23]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[24]  Keith D. Cooper,et al.  Engineering a Compiler , 2003 .

[25]  Shlomit S. Pinter,et al.  Compile time instruction cache optimizations , 1994, CARN.

[26]  Paul Lokuciejewski,et al.  WCET-aware Software Based Cache Partitioning for Multi-Task Real-Time Systems , 2009, WCET.

[27]  José V. Busquets-Mataix,et al.  Cache contents selection for statically-locked instruction caches: an algorithm comparison , 2005, 17th Euromicro Conference on Real-Time Systems (ECRTS'05).

[28]  David B. Whalley,et al.  Improving WCET by applying a WC code-positioning optimization , 2005, TACO.

[29]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[30]  Björn Lisper,et al.  Data cache locking for higher program predictability , 2003, SIGMETRICS '03.

[31]  Henrik Theiling,et al.  Reliable and Precise WCET Determination for a Real-Life Processor , 2001, EMSOFT.

[32]  Vivek Sarkar,et al.  Combining Register Allocation and Instruction Scheduling , 1995 .

[33]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[34]  W. W. Hwu,et al.  Achieving high instruction cache performance with an optimizing compiler , 1989, ISCA '89.

[35]  Manish Verma,et al.  Advanced memory optimization techniques for low-power embedded processors , 2005, Ausgezeichnete Informatikdissertationen.

[36]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS 1987.

[37]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[38]  John Cavazos,et al.  Inducing heuristics to decide whether to schedule , 2004, PLDI '04.

[39]  Krishna Subramanian,et al.  Enhanced modulo scheduling for loops with conditional branches , 1992, MICRO 1992.

[40]  Peter van Beek,et al.  Learning Heuristics for the Superblock Instruction Scheduling Problem , 2009, IEEE Transactions on Knowledge and Data Engineering.

[41]  Steven S. Muchnick,et al.  Efficient instruction scheduling for a pipelined architecture , 1986, SIGPLAN '86.

[42]  Isabelle Puaut,et al.  WCET-centric software-controlled instruction caches for hard real-time systems , 2006, 18th Euromicro Conference on Real-Time Systems (ECRTS'06).

[43]  Tulika Mitra,et al.  Scratchpad allocation for concurrent embedded software , 2008, CODES+ISSS '08.