Compiler optimizations for real time execution of loops on limited memory embedded systems

We propose a framework to carry out an efficient data partitioning for global arrays on limited on-chip memory embedded systems. The key problem addressed in this work is how to perform a good partitioning of data references encountered in loops between on-chip and off-chip memory to meet the demands of real time response by keeping run time overheads of remote access to a minimum. We introduce a concept of footprint to precisely calculate the memory demands of references at compile time and compute a profit value of a reference using its access frequency and reuse factor. We then develop a methodology based on 0/1 knapsack algorithm to partition the references in the local/remote memory. We show the performance improvements due to our approach and compare the results.

[1]  Alok N. Choudhary,et al.  Communication strategies for out-of-core programs on distributed memory machines , 1995, ICS '95.

[2]  P. P. Chakrabarti,et al.  A Simple 0.5-Bounded Greedy Algorithm for the 0/1 Knapsack Problem , 1992, Inf. Process. Lett..

[3]  Rajeev Barua,et al.  Communication-Minimal Partitioning of Parallel Loops and Data Arrays for Cache-Coherent Distributed-Memory Multiprocessors , 1996, LCPC.

[4]  Frank Mueller,et al.  Compiler support for software-based cache partitioning , 1995, Workshop on Languages, Compilers, & Tools for Real-Time Systems.

[5]  Ken Kennedy,et al.  Optimal register assignment to loops for embedded code generation , 1996, TODE.

[6]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[7]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[8]  Kurt Keutzer,et al.  Instruction selection using binate covering for code size optimization , 1995, Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[9]  David B. Whalley,et al.  Decreasing process memory requirements by overlapping program portions , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[10]  Anant Agarwal,et al.  Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..