An optimal allocation of memory buffers for complex multicore platforms

In deeply embedded heterogeneous multicores the allocation of data to memories is crucial for application performance. For applications with stringent throughput constraints, the allocation is often done manually by carefully assigning static memory locations to the logical buffers of the application. Today, designers are confronted with applications with thousands of buffers and architectures with hundreds of memories, rendering manual approaches impractical. In this paper we present an automatic approach for statically allocating logical buffers to physical memories, assuming a fixed task-to-processor mapping and respecting multiple throughput constraints.In our approach, we model the application in a data-centric way, by explicitly defining buffers and associating computational tasks that access the buffers within well-specified time intervals. Besides, we use an architecture model that allows to perform an allocation that is aware of the topology of the multicore and the physical bandwidth constraints of the interconnect. We present a layered approach to describe and solve the buffer-allocation problem as well as related subproblems, using mixed-integer linear programming. We show that the buffer-allocation problem is NP-complete, and present a more scalable formulation as a semi-definite programming problem. We evaluate the proposed LP methods by allocating around 1000 buffers corresponding to processing one frame in the Long-Term Evolution (LTE) standard, onto a multicore with 80 processing elements. We introduce a solution approach that allowed to find an optimal allocation in around 2 hours, which is at least two orders of magnitude faster than a straightforward formulation.

[1]  Ting Chen,et al.  WCET centric data allocation to scratchpad memory , 2005, 26th IEEE International Real-Time Systems Symposium (RTSS'05).

[2]  Rainer Leupers,et al.  Communication-aware mapping of KPN applications onto heterogeneous MPSoCs , 2012, DAC Design Automation Conference 2012.

[3]  Maria Fonoberova,et al.  Algorithms for Finding Optimal Flows in Dynamic Networks , 2010 .

[4]  Michael Engel,et al.  ILP-based Memory-Aware Mapping Optimization for MPSoCs , 2012, 2012 IEEE 15th International Conference on Computational Science and Engineering.

[5]  Simon Fürst,et al.  Challenges in the design of automotive software , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[6]  Mahmut T. Kandemir,et al.  An integer linear programming based approach to simultaneous memory space partitioning and data allocation for chip multiprocessors , 2006, IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).

[7]  Jerónimo Castrillón Mazo Programming heterogeneous MPSoCs: tool flows to close the software productivity gap , 2013 .

[8]  Franz Rendl,et al.  Semidefinite programming and integer programming , 2002 .

[9]  Wolfgang Lehner,et al.  Pathways to servers of the future: highly adaptive energy efficient computing (HAEC) , 2012, DATE 2012.

[10]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[11]  Mauro Olivieri,et al.  Static Minimization of Total Energy Consumption in Memory Subsystem for Scratchpad-Based Systems-on-Chips , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  J. Ramanujam,et al.  An Effective Solution to Task Scheduling and Memory Partitioning for Multiprocessor System-on-Chip , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  G. Borriello,et al.  Communication synthesis for distributed embedded systems , 1998, 1998 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (IEEE Cat. No.98CB36287).

[14]  Hans Kellerer,et al.  Knapsack problems , 2004 .

[15]  Zhiping Jia,et al.  Data Allocation for Embedded Systems with Hybrid On-Chip Scratchpad and Caches , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[16]  Shiann-Rong Kuang,et al.  Multiport memory based data path allocation focusing on interconnection optimization , 1994, Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS '94.

[17]  Gerardus Johannes Maria Smit,et al.  Omphale: Streamlining the Communication for Jobs in a Multi Processor System on Chip , 2007 .

[18]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[19]  Heiko Falk,et al.  Optimal static WCET-aware scratchpad allocation of program code , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[20]  R. Leupers,et al.  Optimized address assignment for DSPs with SIMD memory accesses , 2001, Proceedings of the ASP-DAC 2001. Asia and South Pacific Design Automation Conference 2001 (Cat. No.01EX455).

[21]  Sander Stuijk,et al.  Thermal-aware scratchpad memory design and allocation , 2010, 2010 IEEE International Conference on Computer Design.

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[23]  Tulika Mitra,et al.  Integrated scratchpad memory optimization and task scheduling for MPSoC architectures , 2006, CASES '06.

[24]  Soonhoi Ha,et al.  Multi-objective mapping optimization via problem decomposition for many-core systems , 2012, 2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia.

[25]  Erling D. Andersen,et al.  Presolving in linear programming , 1995, Math. Program..

[26]  Zhu Wang,et al.  WCET-Aware Energy-Efficient Data Allocation on Scratchpad Memory for Real-Time Embedded Systems , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[27]  Rainer Leupers,et al.  Buffer Allocation Based On-Chip Memory Optimization for Many-Core Platforms , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[28]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[29]  Rainer Leupers,et al.  Optimized buffer allocation in multicore platforms , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[30]  Shuvra S. Bhattacharyya,et al.  Embedded Multiprocessors: Scheduling and Synchronization , 2000 .

[31]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[32]  Kwangsoo Seo,et al.  Allocation of multiport memories in ASIC data path synthesis , 1994, Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS '94.

[33]  Alon Itai,et al.  On the complexity of time table and multi-commodity flow problems , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).

[34]  Peter Marwedel,et al.  Cache-Aware Scratchpad-Allocation Algorithms for Energy-Constrained Embedded Systems , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[35]  R. Govindarajan,et al.  Buffer allocation in regular dataflow networks: an approach based on coloring circular-arc graphs , 1996, Proceedings of 3rd International Conference on High Performance Computing (HiPC).

[36]  Ahmed Amine Jerraya,et al.  An optimal memory allocation for application-specific multiprocessor system-on-chip , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).