Advanced compiling techniques to reduce RAM usage of static operating systems

In recent years, a rapidly growing number of small embedded systems have been used in very high volumes. One example is the automotive industry, where the number of Electronic Control Units (ECU) in a single car is approaching 100 for high end automobiles and several dozens are used in mid-range cars. Small system-on-chip microcontrollers are often used with static operating systems. As on-chip RAM is rather expensive and only few KBs of RAM are available on such devices, reducing the RAM usage is an important objective in order to save costs — especially in high-volume production. This thesis presents several new approaches to reduce the RAM usage of such systems by applying advanced compilation and optimization techniques. Common optimizations are examined regarding their impact on RAM usage. By selecting classical optimization algorithms regarding their impact on RAM usage, the RAM required for a series of test cases is reduced by almost 20%. Upper bounds for stack sizes of application tasks will be statically calculated using high-level analysis available in the compiler. Comparisons with a commercial tool working on machine-code-level show clear advantages regarding maintainability as well as reliability. Most important, the register sets stored by the operating system when a task is preempted are optimized by abstaining from saving unnecessary registers. Inter-task register-allocation further reduces the RAM required to save those task contexts. The new algorithms have been added to a production quality compiler and a full commercial OSEK implementation was modified to make use of the new optimizations. Tests on real hardware as well as comparisons with commercial tools not only show that the system works and improves usability and maintainability, but also that significant reductions of RAM requirements, and therefore cost savings, are possible. In a series of benchmarks, RAM usage is reduced on average by 30%–60%.

[1]  Michael Hind,et al.  Flow-sensitive interprocedural constant propagation , 1995, PLDI '95.

[2]  Jens Palsberg,et al.  Stack Size Analysis for Interrupt-Driven Programs , 2003, SAS.

[3]  William H. Harrison A New Strategy for Code Generation - the General-Purpose Optimizing Compiler , 1979, IEEE Trans. Software Eng..

[4]  Jeffrey M. Barth A practical interprocedural data flow analysis algorithm , 1978, CACM.

[5]  R. A. Freiburghouse,et al.  Register allocation via usage counts , 1974, CACM.

[6]  Fabrizio Luccio,et al.  A comment on index register allocation , 1967, CACM.

[7]  Steven S. Muchnick,et al.  Efficient instruction scheduling for a pipelined architecture , 1986, SIGPLAN '86.

[8]  Krishna V. Palem,et al.  The emerging power crisis in embedded processors: what can a poor compiler do? , 2001, CASES '01.

[9]  Benjamin Livshits,et al.  Tracking pointers with path and context sensitivity for bug detection in C programs , 2003, ESEC/FSE-11.

[10]  Rainer Leupers,et al.  Code optimization techniques for embedded processors - methods, algorithms, and tools , 2000 .

[11]  Edward S. Lowry,et al.  Object code optimization , 1969, CACM.

[12]  David I. August,et al.  Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[13]  Thomas R. Gross,et al.  Postpass Code Optimization of Pipeline Constraints , 1983, TOPL.

[14]  William E. Weihl,et al.  Interprocedural data flow analysis in the presence of pointers, procedure variables, and label variables , 1980, POPL '80.

[15]  Paul G. Sorenson,et al.  The Theory And Practice of Compiler Writing , 1985 .

[16]  John L. Hennessy,et al.  The priority-based coloring approach to register allocation , 1990, TOPL.

[17]  Mark N. Wegman,et al.  Constant propagation with conditional branches , 1985, POPL.

[18]  Ken Kennedy,et al.  Interprocedural constant propagation , 1986, SIGP.

[19]  Volker Barthelmann Inter-task register-allocation for static operating systems , 2002, LCTES/SCOPES '02.

[20]  Johan Cockx Whole program compilation for embedded software: the ADSL experiment , 2001, Ninth International Symposium on Hardware/Software Codesign. CODES 2001 (IEEE Cat. No.01TH8571).

[21]  Mary Lou Soffa,et al.  Predicting the impact of optimizations for embedded systems , 2003, LCTES '03.

[22]  Etienne Morel,et al.  Global optimization by suppression of partial redundancies , 1979, CACM.

[23]  Friedhelm Stappert,et al.  Complete worst-case execution time analysis of straight-line hard real-time programs , 2000, J. Syst. Archit..

[24]  J. E. Ball,et al.  Predicting the effects of optimization on a procedure body , 1979, SIGPLAN '79.

[25]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[26]  Andrew Ayers,et al.  Scalable cross-module optimization , 1998, PLDI '98.

[27]  Fred C. Chow Minimizing register usage penalty at procedure calls , 1988, PLDI '88.

[28]  Shoji Suzuki,et al.  EMERALDS-OSEK: A Small Real-Time Operating System for Automotive Control and Monitoring , 1999 .

[29]  David B. Whalley,et al.  Fast context switches: compiler and architectural support for preemptive scheduling , 1995, Microprocess. Microsystems.

[30]  Scott A. Mahlke,et al.  Compiler code transformations for superscalar-based high-performance systems , 1992, Proceedings Supercomputing '92.

[31]  Erik Brockmeyer,et al.  Data and memory optimization techniques for embedded systems , 2001, TODE.

[32]  Dirk Grunwald,et al.  Whole-program optimization for time and space efficient threads , 1996, ASPLOS VII.

[33]  John Regehr,et al.  Eliminating stack overflow by abstract interpretation , 2003, TECS.

[34]  Vivek Sarkar Optimized Unrolling of Nested Loops , 2004, International Journal of Parallel Programming.

[35]  Christopher W. Fraser,et al.  A retargetable compiler for ANSI C , 1991, SIGP.

[36]  Scott McFarling,et al.  Program optimization for instruction caches , 1989, ASPLOS III.

[37]  John Hughes,et al.  Recursion and dynamic data-structures in bounded space: towards embedded ML programming , 1999, ICFP '99.

[38]  A. P. Yershóv ALPHA—An Automatic Programming System of High Efficiency , 1966, JACM.

[39]  William E. Weihl,et al.  Register relocation: flexible contexts for multithreading , 1993, ISCA '93.

[40]  B. A. Wichmann High Integrity Ada , 1997, SAFECOMP.

[41]  Mark Alan Jones,et al.  What really happened on mars rover pathfinder , 1997 .

[42]  Johann Blieberger,et al.  Worst-case space and time complexity of recursive procedures , 1996, Real-Time Systems.

[43]  Richard M. Karp,et al.  Index Register Allocation , 1966, JACM.

[44]  Eugene W. Myers,et al.  A precise inter-procedural data flow algorithm , 1981, POPL '81.

[45]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[46]  Daniel M. Roy,et al.  Meeting Deadlines in Hard Real-Time Systems , 1999 .

[47]  Steven O. Hobbs,et al.  The GEM Optimizing Compiler System , 1992, Digit. Tech. J..

[48]  Jack J. Dongarra,et al.  Unrolling loops in fortran , 1979, Softw. Pract. Exp..

[49]  Bernhard Steffen,et al.  Partial dead code elimination , 1994, PLDI '94.

[50]  Frances E. Allen,et al.  Control-flow analysis , 2022 .

[51]  Donglin Liang,et al.  Efficient points-to analysis for whole-program analysis , 1999, ESEC/FSE-7.

[52]  David W. Goodwin,et al.  Interprocedural dataflow analysis in an executable optimizer , 1997, PLDI '97.

[53]  Gary A. Kildall,et al.  A unified approach to global program optimization , 1973, POPL.

[54]  Dr. Rainer Leupers LANCE : A C Compiler Platform for Embedded Processors , .

[55]  Frances E. Allen,et al.  Interprocedural Data Flow Analysis , 1974, IFIP Congress.

[56]  Gregory J. Chaitin,et al.  Register allocation and spilling via graph coloring , 2004, SIGP.

[57]  Ken Kennedy,et al.  Fast interprocedual alias analysis , 1989, POPL '89.

[58]  Daniel Kästner,et al.  Generic control flow reconstruction from assembly code , 2002, LCTES/SCOPES '02.

[59]  Jürg Nievergelt,et al.  On the automatic simplification of computer programs , 1965, CACM.

[60]  Keith D. Cooper,et al.  Operator strength reduction , 2001, TOPL.

[61]  Andrew W. Appel,et al.  Lambda-splitting: a higher-order approach to cross-module optimizations , 1997, ICFP '97.

[62]  Marc Michael Brandis Optimizing compilers for structured programming languages , 1995 .

[63]  Danilo Beuche,et al.  PURE/OSEK - Eine aspektorientierte Betriebssystemfamilie für Kraftfahrzeuge , 2003, GI Jahrestagung.

[64]  Alfred V. Aho,et al.  A formal approach to code optimization , 1970 .

[65]  Lambert Spaanenburg,et al.  Embedded Systems Roadmap 2002 , 2002 .

[66]  Jack W. Davidson,et al.  An Aggressive Approach to Loop Unrolling , 2001 .

[67]  Rainer Leupers,et al.  Function inlining under code size constraints for embedded processors , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).

[68]  Jenq Kuen Lee,et al.  Compiler optimization on VLIW instruction scheduling for low power , 2003, TODE.

[69]  Jingling Xue,et al.  Optimal and efficient speculation-based partial redundancy elimination , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[70]  Kevin Scott,et al.  On Proebsting''s Law , 2001 .

[71]  Yunheung Paek,et al.  Efficient register and memory assignment for non-orthogonal architectures via graph coloring and MST algorithms , 2002, LCTES/SCOPES '02.

[72]  M. Franz Compiler Optimizations Should Pay for Themselves Applying the Spirit of Oberon to Code Optimization by Compiler , 1998 .

[73]  Ken Kennedy,et al.  An algorithm for reduction of operator strength , 1977, Commun. ACM.

[74]  Wei Zhang,et al.  Interprocedural optimizations for improving data cache performance of array-intensive embedded applications , 2003, DAC '03.

[75]  David W. Wall,et al.  Global register allocation at link time , 1986, SIGPLAN '86.

[76]  Bernhard Steffen,et al.  Lazy code motion , 1992, PLDI '92.

[77]  Martin Hopkins,et al.  An overview of the PL.8 compiler , 1982, SIGP.

[78]  J. Cocke Global common subexpression elimination , 1970, Symposium on Compiler Optimization.

[79]  Keith D. Cooper,et al.  Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[80]  Bernhard Scholz,et al.  Register allocation for irregular architectures , 2002, LCTES/SCOPES '02.

[81]  Wei Zhang,et al.  A compiler approach for reducing data cache energy , 2003, ICS '03.

[82]  Yanhong A. Liu,et al.  Automatic Accurate Stack Space and Heap Space Analysis for High-Level Languages , 2000 .

[83]  Henk Corporaal,et al.  Advanced copy propagation for arrays , 2003, LCTES '03.

[84]  Peter Marwedel,et al.  Analysis of the influence of register file size on energyconsumption, code size, and execution time , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[85]  Andreas Podelski,et al.  Efficient algorithms for pre* and post* on interprocedural parallel flow graphs , 2000, POPL '00.

[86]  Wen-mei W. Hwu,et al.  Inline function expansion for compiling C programs , 1989, PLDI '89.

[87]  Keith D. Cooper,et al.  Effective partial redundancy elimination , 1994, PLDI '94.

[88]  Anne M. Holler Compiler optimizations for the PA-8000 , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.

[89]  Preston Briggs,et al.  Register allocation via graph coloring , 1992 .

[90]  Emre Ozer,et al.  Classification of compiler optimizations for high performance, small area and low power in FPGAs , 2003 .

[91]  Trevor Mudge,et al.  The Need for Large Register Files in Integer Codes , 2000 .