Dynamic application allocation with resource balancing on NoC based many-core embedded systems

Abstract It is a fundamental challenge to manage on-chip resources for future embedded applications executing concurrently on a NoC (network on chip) based many-core embedded system (MES). Embedded application allocation is required under constraints in the form of computing resources or communication resources. However, most existing techniques only focus on the optimization of communications between application threads and ignore a balanced utilization of on-chip resources, which is critical for embedded systems. In this paper, we propose a dynamic resource balance (DRB) algorithm to achieve a higher system performance by balancing the utilization of on-chip computing resources and communication resources. The DRB algorithm first constructs a mapping scheme using a dynamic communication optimization (DCO) algorithm and then chooses a corresponding number of resource regions for the constructed mapping scheme to allocate the application using a multi-rectangle selection (MRS) algorithm. We evaluate DRB algorithm in a popular simulator Graphite whose results reveal that DRB algorithm improves system throughput by at most up to 31.6%, 25.2%, and 9.4% compared with FF (First Free) algorithm, NN (Nearest Neighbor) algorithm, and CoNA-SHiC (Contiguous Neighbor Allocation and Smart Hill Climbing) algorithm, respectively.

[1]  Amit Kumar Singh,et al.  Communication-aware heuristics for run-time task mapping on NoC-based MPSoC platforms , 2010, J. Syst. Archit..

[2]  Fernando Gehm Moraes,et al.  Heuristics for Dynamic Task Mapping in NoC-based Heterogeneous MPSoCs , 2007, 18th IEEE/IFIP International Workshop on Rapid System Prototyping (RSP '07).

[3]  Sang Hyuk Son,et al.  New Strategies for Assigning Real-Time Tasks to Multiprocessor Systems , 1995, IEEE Trans. Computers.

[4]  Mohamed Shalan,et al.  Energy-efficient task allocation techniques for asymmetric multiprocessor embedded systems , 2014, ACM Trans. Embed. Comput. Syst..

[5]  Yunhao Liu,et al.  Sea Depth Measurement with Restricted Floating Sensors , 2007, 28th IEEE International Real-Time Systems Symposium (RTSS 2007).

[6]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[7]  Radu Marculescu,et al.  User-Aware Dynamic Task Allocation in Networks-on-Chip , 2008, 2008 Design, Automation and Test in Europe.

[8]  Muhammad Shafique,et al.  Distributed scheduling for many-cores using cooperative game theory , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Chu-Sing Yang,et al.  A High Performance Load Balance Strategy for Real-Time Multicore Systems , 2014, TheScientificWorldJournal.

[10]  Wei Quan,et al.  A Hybrid Task Mapping Algorithm for Heterogeneous MPSoCs , 2015, ACM Trans. Embed. Comput. Syst..

[11]  Pierre Boulet,et al.  Heuristics for Routing and Spiral Run-time Task Mapping in NoC-based Heterogeneous MPSOCs , 2013, ArXiv.

[12]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[13]  Pasi Liljeberg,et al.  Smart hill climbing for agile dynamic mapping in many-core systems , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[14]  Muhammad Usman Karim Khan,et al.  Software architecture of High Efficiency Video Coding for many-core systems with power-efficient workload balancing , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[15]  Jorg Henkel,et al.  Agent-based distributed power management for kilo-core processors , 2013, ICCAD.

[16]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[17]  Laxmikant V. Kalé,et al.  A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems , 2012, 2012 41st International Conference on Parallel Processing.

[18]  Shekhar Y. Borkar,et al.  Thousand Core ChipsA Technology Perspective , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[19]  Pasi Liljeberg,et al.  CoNA: Dynamic application mapping for congestion reduction in many-core systems , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[20]  Qi Yang,et al.  Energy-aware partitioning for multiprocessor real-time systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[21]  Daniel F. García,et al.  Worst-case utilization bound for EDF scheduling on real-time multiprocessor systems , 2000, Proceedings 12th Euromicro Conference on Real-Time Systems. Euromicro RTS 2000.

[22]  Tulika Mitra,et al.  Task Scheduling on Adaptive Multi-Core , 2014, IEEE Transactions on Computers.

[23]  Radu Marculescu,et al.  Incremental run-time application mapping for homogeneous NoCs with multiple voltage levels , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[24]  Luca Benini,et al.  Message Passing-Aware Power Management on Many-Core Systems , 2014, J. Low Power Electron..

[25]  Luciano Lavagno,et al.  Virtual Platform-Based Design Space Exploration of Power-Efficient Distributed Embedded Applications , 2015, ACM Trans. Embed. Comput. Syst..

[26]  Reetuparna Das,et al.  Application-to-core mapping policies to reduce memory system interference in multi-core systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[27]  Fernando Gehm Moraes,et al.  Congestion-Aware Task Mapping in NoC-based MPSoCs with Dynamic Workload , 2007, IEEE Computer Society Annual Symposium on VLSI (ISVLSI '07).

[28]  Dong-Ik Oh,et al.  Utilization Bounds for N-Processor Rate Monotone Scheduling with Static Processor Assignment , 1998, Real-Time Systems.

[29]  Muhammad Zakarya,et al.  Energy Efficient Workload Balancing Algorithm for Real-Time Tasks over Multi-Core , 2013 .

[30]  Xiong Xiao,et al.  The Importance of Dynamic Load Balancing among OpenMP Thread Teams for Irregular Workloads , 2016, 2016 Fourth International Symposium on Computing and Networking (CANDAR).

[31]  Radu Marculescu,et al.  Run-Time Task Allocation Considering User Behavior in Embedded Multiprocessor Networks-on-Chip , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[32]  Muhammad Shafique,et al.  Distributed fair scheduling for many-cores , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[33]  Meikang Qiu,et al.  Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems , 2009, TODE.

[34]  Amit Kumar Singh,et al.  Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[35]  Xu Liu,et al.  A Dynamic Contention-aware Application Allocation Algorithm for Many-core Processor , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[36]  David Z. Pan,et al.  UNISM: Unified Scheduling and Mapping for General Networks on Chip , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[37]  Muhammad Usman Karim Khan,et al.  Power efficient and workload balanced tiling for parallelized high efficiency video coding , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[38]  Pao-Ann Hsiung,et al.  Efficient Workload Balancing on Heterogeneous GPUs using Mixed-Integer Non-Linear Programming , 2014 .

[39]  Wei Sun,et al.  Heuristics and Evaluations of Energy-Aware Task Mapping on Heterogeneous Multiprocessors , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[40]  Jörg Henkel,et al.  ADAM: Run-time agent-based distributed application mapping for on-chip communication , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[41]  Muhammad Usman Karim Khan,et al.  Power-Efficient Workload Balancing for Video Applications , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[42]  Meikang Qiu,et al.  Resource allocation robustness in multi-core embedded systems with inaccurate information , 2011, J. Syst. Archit..