Array Size Computation under Uniform Overlapping and Irregular Accesses

The size required to store an array is crucial for an embedded system, as it affects the memory size, the energy per memory access, and the overall system cost. Existing techniques for finding the minimum number of resources required to store an array are less efficient for codes with large loops and not regularly occurring memory accesses. They have to approximate the accessed parts of the array leading to overestimation of the required resources. Otherwise, their exploration time is increased with an increase over the number of the different accessed parts of the array. We propose a methodology to compute the minimum resources required for storing an array which keeps the exploration time low and provides a near-optimal result for regularly and non-regularly occurring memory accesses and overlapping writes and reads.

[1]  Vincent Loechner,et al.  Integer affine transformations of parametric ℤ-polytopes and applications to loop nest optimization , 2012, TACO.

[2]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[3]  Monica S. Lam,et al.  Array-data flow analysis and its use in array privatization , 1993, POPL '93.

[4]  Albert Cohen,et al.  Storage Mapping Optimization for Parallel Programs , 1999, Euro-Par.

[5]  Hugo De Man,et al.  Formalized methodology for data reuse: exploration for low-power hierarchical memory mappings , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[6]  Hugo De Man,et al.  Memory Size Reduction Through Storage Order Optimization for Embedded Parallel Multimedia Applications , 1997, Parallel Comput..

[7]  Francky Catthoor,et al.  A systematic approach to classify design-time global scheduling techniques , 2013, CSUR.

[8]  Mahmut T. Kandemir A compiler technique for improving whole-program locality , 2001, POPL '01.

[9]  Franco P. Preparata,et al.  Approximation algorithms for convex hulls , 1982, CACM.

[10]  Ben H. H. Juurlink,et al.  Scalable Parallel Programming Applied to H.264/AVC Decoding , 2012, SpringerBriefs in Computer Science.

[11]  Sven Verdoolaege,et al.  Polynomial approximations in the polytope model: Bringing the power of quasi-polynomials to the masses , 2008 .

[12]  Alex Ramirez,et al.  Understanding the Application: An Overview of the H.264 Standard , 2012 .

[13]  Francky Catthoor,et al.  Storage requirement estimation for optimized design of data intensive applications , 2004, TODE.

[14]  Francky Catthoor,et al.  A scalable and near-optimal representation of access schemes for memory management , 2014, TACO.

[15]  Peter Vanbroekhoven,et al.  A practical dynamic single assignment transformation , 2007, TODE.

[16]  Richard O. Duda,et al.  Use of the Hough transformation to detect lines and curves in pictures , 1972, CACM.

[17]  Peter Vanbroekhoven,et al.  Transformation to Dynamic Single Assignment Using a Simple Data Flow Analysis , 2005, APLAS.

[18]  Philippe Clauss,et al.  Symbolic Polynomial Maximization Over Convex Sets and Its Application to Memory Requirement Estimation , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[19]  Benoît Meister,et al.  Automatic memory layout transformations to optimize spatial locality in parameterized loop nests , 2000, CARN.

[20]  Hugo De Man,et al.  Modeling multidimensional data and control flow , 1993, IEEE Trans. Very Large Scale Integr. Syst..

[21]  E. Bronstein Approximation of convex sets by polytopes , 2008 .

[22]  Hai Zhou,et al.  Parallel CAD: Algorithm Design and Programming Special Section Call for Papers TODAES: ACM Transactions on Design Automation of Electronic Systems , 2010 .

[23]  Francky Catthoor,et al.  Storage Estimation and Design Space Exploration Methodologies for the Memory Management of Signal Processing Applications , 2008, J. Signal Process. Syst..

[24]  Nikil D. Dutt,et al.  Library mapping for memories , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[25]  Francky Catthoor,et al.  Memory-aware system scenario approach energy impact , 2012, NORCHIP 2012.

[26]  H.J. De Man,et al.  Automating High Level Control F'low Transformations For Dsp Memory Management , 1992, Workshop on VLSI Signal Processing.

[27]  Wim F. J. Verhaegh,et al.  Allocation of multiport memories for hierarchical data stream , 1993, ICCAD.

[28]  Rastislav Bodík,et al.  An efficient profile-analysis framework for data-layout optimizations , 2002, POPL '02.

[29]  Loo Hay Lee,et al.  An optimization model for storage yard management in transshipment hubs , 2006, OR Spectr..

[30]  Francky Catthoor Energy-Delay Efficient Data Storage and Transfer Architectures and Methodologies: Current Solutions and Remaining Problems , 1999, J. VLSI Signal Process..

[31]  Yunheung Paek,et al.  Efficient and precise array access analysis , 2002, TOPL.

[32]  Nicholas Nethercote,et al.  "Building Workload Characterization Tools with Valgrind" , 2006, 2006 IEEE International Symposium on Workload Characterization.

[33]  Paul Feautrier,et al.  Array expansion , 1988, ICS '88.

[34]  Zhiyu Shen,et al.  An Empirical Study of Fortran Programs for Parallelizing Compilers , 1990, IEEE Trans. Parallel Distributed Syst..

[35]  Hugo De Man,et al.  System-level transformations for low power data transfer and storage , 1998 .

[36]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[37]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[38]  Dhiraj K. Pradhan,et al.  Energy-Aware Memory Management for Embedded Multimedia Systems: A Computer-Aided Design Approach , 2011 .

[39]  Francky Catthoor,et al.  Exploration of energy efficient memory organisations for dynamic multimedia applications using system scenarios , 2013, Des. Autom. Embed. Syst..

[40]  Armin Größlinger Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes , 2009, CC.

[41]  David R. Kaeli,et al.  Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.

[42]  Krishna V. Palem,et al.  Design space optimization of embedded memory systems via data remapping , 2002, LCTES/SCOPES '02.

[43]  François Irigoin,et al.  Exact versus Approximate Array Region Analyses , 1996, LCPC.

[44]  Henk Corporaal,et al.  System-scenario-based design of dynamic embedded systems , 2009, TODE.

[45]  Francky Catthoor,et al.  Near-optimal and scalable intrasignal in-place optimization for non-overlapping and irregular access schemes , 2013, TODE.

[46]  Henk Corporaal,et al.  A step toward a scalable dynamic single assignment conversion , 2003 .

[47]  Erik Brockmeyer,et al.  Data and memory optimization techniques for embedded systems , 2001, TODE.

[48]  Yunheung Paek,et al.  Software controlled memory layout reorganization for irregular array access patterns , 2007, CASES '07.

[49]  Krishna M. Kavi,et al.  International Conference on Computational Science, ICCS 2011 Gleipnir: A Memory Analysis Tool , 2011, ICCS.

[50]  Vincent Loechner,et al.  Polyhedral Techniques for Parametric Memory Requirement Estimation , 2011 .

[51]  W.F.J. Verhaegh,et al.  Allocation of multiport memories for hierarchical data streams , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[52]  Gerda Janssens,et al.  Storage Size Reduction by In-place Mapping of Arrays , 2002, VMCAI.

[53]  B. Eatrice Creusillet,et al.  Exact vs. Approximate Array Region Analyses , 1996 .

[54]  Ken Kennedy,et al.  A technique for summarizing data access and its use in parallelism enhancing transformations , 1989, PLDI '89.

[55]  Gilles Villard,et al.  Lattice-based memory allocation , 2003, IEEE Transactions on Computers.

[56]  I Barany,et al.  Random polytopes, convex bodies, and approximation , 2007 .

[57]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[58]  Jason Cong,et al.  Automatic memory partitioning and scheduling for throughput and power optimization , 1999, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[59]  Mary W. Hall,et al.  Custom data layout for memory parallelism , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[60]  Vincent Loechner,et al.  Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions , 2007, Algorithmica.

[61]  Francky Catthoor,et al.  Data dependency size estimation for use in memory optimization , 2003, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[62]  James Arthur Kohl,et al.  A Tool to Aid in the Design, Implementation, and Understanding of Matrix Algorithms for Parallel Processors , 1990, J. Parallel Distributed Comput..

[63]  Josef Weidendorfer,et al.  A Tool Suite for Simulation Based Analysis of Memory Access Behavior , 2004, International Conference on Computational Science.

[64]  Mahmut T. Kandemir,et al.  Reducing memory requirements of nested loops for embedded systems , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).