Run-time Requirement Enforcement for Loop Programs on Processor Arrays

Loop bounds are often unknown until run time, making it difficult to analyze non-functional properties such as latency at compile-time. Similarly, static allocations of processing resources to loop computations might be too conservative with respect to given performance requirements, or not optimal with respect to the energy consumption. To still satisfy requirements when accelerating loop nests under this uncertainty of loop bounds, we formalize and propose an approach to run-time requirement enforcement: at run time, select a mapping among a set of candidates that satisfies a given set of requirements while optimizing secondary objectives. Because the candidate search space of suitable mappings might be prohibitively large to evaluate at run time, we further introduce two approaches to reduce its cardinality: 1) architecture-specific reduction by solving for parts of the mapping from the requirements, and 2) design-time reduction by finding a k-subset of mappings that maximizes the number of loop bounds where the requirements are satisfied. We implemented our proposed run-time requirement enforcement techniques for a representative class of programmable processor array architecture called tightly coupled processor arrays (TCPAs) and demonstrate their effectiveness with a case study. The case study shows the effectiveness of our approach: We can satisfy given latency requirements while easily saving up to 10% in energy.

[1]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[2]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[3]  Partha S. Roop,et al.  Runtime enforcement of reactive systems using synchronous enforcers , 2017, SPIN.

[4]  Piotr Dziurzanski,et al.  A Survey and Comparative Study of Hard and So Real-time Dynamic Resource Allocation Strategies for Multi / Many-core Systems , 2017 .

[5]  Wei Quan,et al.  A Hybrid Task Mapping Algorithm for Heterogeneous MPSoCs , 2015, ACM Trans. Embed. Comput. Syst..

[6]  Mark A. Wolters A Genetic Algorithm for Selection of Fixed-Size Subsets with Application to Design Problems , 2015 .

[7]  Jürgen Teich,et al.  Symbolic parallelization of loop programs for massively parallel processor arrays , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[8]  Yliès Falcone,et al.  You Should Better Enforce Than Verify , 2010, RV.

[9]  Amit Kumar Singh,et al.  Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[10]  Partha S. Roop,et al.  Runtime Enforcement of Cyber-Physical Systems , 2017, ACM Trans. Embed. Comput. Syst..

[11]  Frank Hannig,et al.  Invasive Tightly-Coupled Processor Arrays , 2014, ACM Trans. Embed. Comput. Syst..

[12]  Benoît Dupont de Dinechin,et al.  A clustered manycore processor architecture for embedded and accelerated applications , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[13]  Michael Glaß,et al.  Language and Compilation of Parallel Programs for *-Predictable MPSoC Execution Using Invasive Computing , 2016, 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC).

[14]  Jürgen Teich,et al.  Modulo scheduling of symbolically tiled loops for tightly coupled processor arrays , 2016, 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[15]  Jürgen Teich,et al.  A prototype of an invasive tightly-coupled processor array , 2012, Proceedings of the 2012 Conference on Design and Architectures for Signal and Image Processing.