Lattice-Based Memory Allocation

We investigate the problem of memory reuse in order to reduce the memory needed to store an array variable. We develop techniques that can lead to smaller memory requirements in the synthesis of dedicated processors or to more effective use by compiled code of software-controlled scratchpad memory. Memory reuse is well-understood for allocating registers to hold scalar variables. Its extension to arrays has been studied recently for multimedia applications, for loop parallelization, and for circuit synthesis from recurrence equations. In all such studies, the introduction of modulo operations to an otherwise affine mapping (of loop or array indices to memory locations) achieves the desired reuse. We develop here a new mathematical framework, based on critical lattices, that subsumes the previous approaches and provides new insight. We first consider the set of indices that conflict, those that cannot be mapped to the same memory cell. Next, we construct the set of differences of conflicting indices. We establish a correspondence between a valid modular mapping and a strictly-admissible integer lattice-one having no nonzero element in common with the set of conflicting index differences. The memory required by an optimal modular mapping is equal to the determinant of the corresponding lattice. The memory reuse problem is thus reduced to the (still interesting and nontrivial) problem of finding a strictly admissible integer lattice-of least determinant. We then propose and analyze several practical strategies for finding strictly admissible integer lattices, either optimal or optimal up to a multiplicative factor, and, hence, memory-saving modular mappings. We explain and analyze previous approaches in terms of our new framework.

[1]  Jan van Leeuwen,et al.  The Structure of Periodic Storage Schemes for Parallel Memories , 1985, IEEE Transactions on Computers.

[2]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.

[3]  J. van Leeuwen,et al.  The one-dimensional skewing problem , 1989 .

[4]  Gerda Janssens,et al.  Storage Size Reduction by In-place Mapping of Arrays , 2002, VMCAI.

[5]  Gilles Villard,et al.  Lattice-based memory allocation , 2003, IEEE Transactions on Computers.

[6]  Sanjay V. Rajopadhye,et al.  Optimizing memory usage in the polyhedral model , 2000, TOPL.

[7]  W. Schmidt,et al.  Notes on the papers on geometry of numbers and on Diophantine approximations , 1990 .

[8]  E. Rijpkema,et al.  Compaan: deriving process networks from Matlab for embedded signal processing architectures , 2000, Proceedings of the Eighth International Workshop on Hardware/Software Codesign. CODES 2000 (IEEE Cat. No.00TH8518).

[9]  László Lovász,et al.  The Generalized Basis Reduction Algorithm , 1990, Math. Oper. Res..

[10]  J. Lagarias Point lattices , 1996 .

[11]  Frédéric Vivien,et al.  Constructing and exploiting linear schedules with prescribed parallelism , 2002, TODE.

[12]  Larry Carter,et al.  Schedule-independent storage mapping for loops , 1998, ASPLOS VIII.

[13]  Jan van Leeuwen,et al.  Periodic versus arbitrary tessellations of the plane using polyominos of a single type , 1983 .

[14]  William Jalby,et al.  Diamond schemes:An organization of parallel memories for efficient array processing , 1984 .

[15]  Frédéric Vivien,et al.  A unified framework for schedule and storage optimization , 2001, PLDI '01.

[16]  A. Darte Mathematical Tools for Loop Transformations: From Systems of Uniform Recurrence Equations to the Polytope Model , 1999 .

[17]  B. Ramakrishna Rau,et al.  PICO: Automatically Designing Custom Computers , 2002, Computer.

[18]  Yves Robert,et al.  Scheduling and Automatic Parallelization , 2000, Birkhäuser Boston.

[19]  Henry D. Shapiro,et al.  Theoretical Limitations on the Efficient Use of Parallel Memories , 1978, IEEE Transactions on Computers.

[20]  László Lovász,et al.  The Generalized Basis Reduction Algorithm (annotated) , 2000 .

[21]  Paul Feautrier,et al.  Storage management in parallel programs , 1997, PDP.

[22]  Yves Robert,et al.  A Characterization of One-to-One Modular Mappings , 1996, Parallel Process. Lett..

[23]  Paul Feautrier,et al.  Automatic Storage Management for Parallel Programs , 1998, Parallel Comput..

[24]  Mike Schlansker,et al.  ShiftQ: a bufferred interconnect for custom loop accelerators , 2001, CASES '01.

[25]  Jan van Leeuwen,et al.  Periodic versus arbitrary tessellations of the plane using polyominos of a single type , 1983, Theoretical Computer Science.

[26]  J. van Leeuwen,et al.  Periodic storage schemes with a minimum number of memory banks , 1983 .

[27]  F. Thorne,et al.  Geometry of Numbers , 2017, Algebraic Number Theory.

[28]  Paul Budnik,et al.  The Organization and Use of Parallel Memories , 1971, IEEE Transactions on Computers.

[29]  Jeffrey C. Lagarias,et al.  Bounds for Lattice Polytopes Containing a Fixed Number of Interior Points in a Sublattice , 1991, Canadian Journal of Mathematics.

[30]  Paul Feautrier,et al.  Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[31]  László Lovász,et al.  Factoring polynomials with rational coefficients , 1982 .

[32]  G. Tel,et al.  Hierarchical memory systems in multiprocessors and multi-periodic skewing schemes , 1985 .

[33]  Alexandru Turjan,et al.  Storage Management in Process Networks using the Lexicographically Maximal Preimage , 2003, ASAP.

[34]  Hugo De Man,et al.  Memory Size Reduction Through Storage Order Optimization for Embedded Parallel Multimedia Applications , 1997, Parallel Comput..

[35]  László Lovász,et al.  Algorithmic theory of numbers, graphs and convexity , 1986, CBMS-NSF regional conference series in applied mathematics.