A scalable and near-optimal representation of access schemes for memory management

Memory management searches for the resources required to store the concurrently alive elements. The solution quality is affected by the representation of the element accesses: a sub-optimal representation leads to overestimation and a non-scalable representation increases the exploration time. We propose a methodology to near-optimal and scalable represent regular and irregular accesses. The representation consists of a set of pattern entries to compactly describe the behavior of the memory accesses and of pattern operations to consistently combine the pattern entries. The result is a final sequence of pattern entries which represents the global access scheme without unnecessary overestimation.

[1]  Hugo De Man,et al.  Formalized methodology for data reuse exploration in hierarchical memory mappings , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[2]  Francky Catthoor Energy-Delay Efficient Data Storage and Transfer Architectures and Methodologies: Current Solutions and Remaining Problems , 1999, J. VLSI Signal Process..

[3]  KandemirMahmut Taylan A compiler technique for improving whole-program locality , 2001 .

[4]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[5]  Monica S. Lam,et al.  Array-data flow analysis and its use in array privatization , 1993, POPL '93.

[6]  Jason Cong,et al.  Automatic memory partitioning and scheduling for throughput and power optimization , 1999, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[7]  Mary W. Hall,et al.  Custom data layout for memory parallelism , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[8]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[9]  Zhiyu Shen,et al.  An Empirical Study of Fortran Programs for Parallelizing Compilers , 1990, IEEE Trans. Parallel Distributed Syst..

[10]  Vincent Loechner,et al.  Integer affine transformations of parametric ℤ-polytopes and applications to loop nest optimization , 2012, TACO.

[11]  Francky Catthoor,et al.  Data dependency size estimation for use in memory optimization , 2003, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[12]  James Arthur Kohl,et al.  A Tool to Aid in the Design, Implementation, and Understanding of Matrix Algorithms for Parallel Processors , 1990, J. Parallel Distributed Comput..

[13]  Mahmut T. Kandemir A compiler technique for improving whole-program locality , 2001, POPL '01.

[14]  H.J. De Man,et al.  Automating High Level Control F'low Transformations For Dsp Memory Management , 1992, Workshop on VLSI Signal Processing.

[15]  Krishna M. Kavi,et al.  International Conference on Computational Science, ICCS 2011 Gleipnir: A Memory Analysis Tool , 2011, ICCS.

[16]  David R. Kaeli,et al.  Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.

[17]  Francky Catthoor,et al.  Systematic dynamic memory management design methodology for reduced memory footprint , 2006, TODE.

[18]  Krishna V. Palem,et al.  Design space optimization of embedded memory systems via data remapping , 2002, LCTES/SCOPES '02.

[19]  Armin Größlinger Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes , 2009, CC.

[20]  Nikil D. Dutt,et al.  Library mapping for memories , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[21]  Yunheung Paek,et al.  Efficient and precise array access analysis , 2002, TOPL.

[22]  Yunheung Paek,et al.  Software controlled memory layout reorganization for irregular array access patterns , 2007, CASES '07.

[23]  MartonosiMargaret,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992 .

[24]  Hugo De Man,et al.  Modeling multidimensional data and control flow , 1993, IEEE Trans. Very Large Scale Integr. Syst..

[25]  Hugo De Man,et al.  Platform Independent Data Transfer and Storage Exploration Illustrated on Parallel Cavity Detection Algorithm , 1999, PDPTA.

[26]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[27]  Mahmut T. Kandemir,et al.  Reducing memory requirements of nested loops for embedded systems , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[28]  Ken Kennedy,et al.  A technique for summarizing data access and its use in parallelism enhancing transformations , 1989, PLDI '89.

[29]  Gilles Villard,et al.  Lattice-based memory allocation , 2003, IEEE Transactions on Computers.

[30]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[31]  Hugo De Man,et al.  Transformation of Nested Loops with Modulo Indexing to Affine Recurrences , 1994, Parallel Process. Lett..

[32]  Henk Corporaal,et al.  System-scenario-based design of dynamic embedded systems , 2009, TODE.

[33]  Francky Catthoor,et al.  Near-optimal and scalable intrasignal in-place optimization for non-overlapping and irregular access schemes , 2013, TODE.

[34]  Mahmut T. Kandemir,et al.  Access pattern-based code compression for memory-constrained systems , 2008, TODE.

[35]  B. Eatrice Creusillet,et al.  Exact vs. Approximate Array Region Analyses , 1996 .

[36]  Benoît Meister,et al.  Automatic memory layout transformations to optimize spatial locality in parameterized loop nests , 2000, CARN.

[37]  Margaret Martonosi,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.

[38]  Francky Catthoor,et al.  Storage requirement estimation for optimized design of data intensive applications , 2004, TODE.

[39]  BodíkRastislav,et al.  An efficient profile-analysis framework for data-layout optimizations , 2002 .

[40]  Wim F. J. Verhaegh,et al.  Allocation of multiport memories for hierarchical data stream , 1993, ICCAD.

[41]  Rastislav Bodík,et al.  An efficient profile-analysis framework for data-layout optimizations , 2002, POPL '02.

[42]  François Irigoin,et al.  Exact versus Approximate Array Region Analyses , 1996, LCPC.

[43]  Francky Catthoor,et al.  Dynamic data type refinement methodology for systematic performance-energy design exploration of network applications , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[44]  KritikakouAngeliki,et al.  A scalable and near-optimal representation of access schemes for memory management , 2014 .

[45]  W.F.J. Verhaegh,et al.  Allocation of multiport memories for hierarchical data streams , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[46]  John Zahorjan,et al.  Optimizing Data Locality by Array Restructuring , 1995 .

[47]  Nikil D. Dutt,et al.  Local memory exploration and optimization in embedded systems , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[48]  Erik Brockmeyer,et al.  Data and memory optimization techniques for embedded systems , 2001, TODE.

[49]  Hugo De Man,et al.  Formalized methodology for data reuse: exploration for low-power hierarchical memory mappings , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[50]  Hugo De Man,et al.  System-level transformations for low power data transfer and storage , 1998 .

[51]  Rudy Lauwereins,et al.  Systematic data reuse exploration methodology for irregular access patterns , 2000, ISSS '00.