Reducing memory requirements of nested loops for embedded systems

Most embedded systems have limited amount of memory. In contrast, the memory requirements of code (in particular loops) running on embedded systems is significant. This paper addresses the problem of estimating the amount of memory needed for transfers of data in embedded systems. The problem of estimating the region associated with a statement or the set of elements referenced by a statement during the execution of the entire set of nested loops is analyzed. A quantitative analysis of the number of elements referenced is presented; exact expressions for uniformly generated references and a close upper and lower bound for non-uniformly generated references are derived. In addition to presenting an algorithm that computes the total memory required, we discuss the effect of transformations on the lifetimes of array variables, i.e., the time between the first and last accesses to a given array location. A detailed analysis on the effect of unimodular transformations on data locality including the calculation of the maximum window size is discussed. The term maximum window size is introduced and quantitative expressions are derived to compute the window size. The smaller the value of the maximum window size, the higher the amount of data locality in the loop.

[1]  Hugo De Man,et al.  Array placement for storage size reduction in embedded multimedia systems , 1997, Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors.

[2]  Dennis Gannon,et al.  On the problem of optimizing data transfers for complex memory systems , 1988, ICS '88.

[3]  Sharad Malik,et al.  Exact memory size estimation for array computations without loop unrolling , 1999, DAC '99.

[4]  Francky Catthoor,et al.  Custom Memory Management Methodology , 1998, Springer US.

[5]  Hugo De Man,et al.  Background memory area estimation for multidimensional signal processing systems , 1995, IEEE Trans. Very Large Scale Integr. Syst..

[6]  Sharad Malik,et al.  Simultaneous reference allocation in code generation for dual data memory bank ASIPs , 2000, TODE.

[7]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[8]  Philippe Clauss Counting Solutions to Linear and Nonlinear Constraints Through Ehrhart Polynomials: Applications to Analyze and Transform Scientific Programs , 1996, International Conference on Supercomputing.

[9]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[10]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[11]  Vivek Sarkar,et al.  On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.

[12]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[13]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[14]  William Jalby,et al.  A strategy for array management in local memory , 1994, Math. Program..

[15]  William Pugh,et al.  Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.

[16]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[17]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[18]  Ken Kennedy,et al.  Optimizing for parallelism and data locality , 1992, ICS '92.