A distributed energy-aware task mapping to achieve thermal balancing and improve reliability of many-core systems

Investigating novel techniques to improve many-core embedded systems lifetime, reliability, and thermal management is a fundamental challenge for the semiconductor industry. Imbalanced mapping of applications may considerably affect the system performance and lifetime due to thermal issues in an integrated circuit (e.g. hotspot zones). Traditional mapping techniques focus on local optimizations, e.g. minimize the number of hops between communicating tasks, which may lead to hotspot zones and underutilization of some processing resources. This paper proposes a runtime mapping heuristic whose cost function targets temporal workload and energy consumption balance in large scale systems. The proposed heuristic minimizes the occurrence of hotspots by distributing application workload onto the processing elements in a uniform way, which contributes to a balanced thermal distribution across the system. These features improve system reliability and postpone aging effects. Results with several benchmarks executing in a cycle-accurate platform model show a uniform system utilization when comparing the proposed heuristic to conventional mapping approaches.

[1]  Fernando Gehm Moraes,et al.  Fast energy evaluation of embedded applications for many-core systems , 2014, 2014 24th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).

[2]  Fernando Gehm Moraes,et al.  Distributed resource management in NoC-based MPSoCs with dynamic cluster sizes , 2013, 2013 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[3]  Tajana Simunic,et al.  Temperature Aware Task Scheduling in MPSoCs , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[4]  Sharad Malik,et al.  Power analysis of embedded software: a first step towards software power minimization , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[5]  Luca Benini,et al.  Optimum: Thermal-aware task allocation for heterogeneous many-core devices , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).

[6]  Rajesh K. Gupta,et al.  Leakage aware dynamic voltage scaling for real-time embedded systems , 2004, Proceedings. 41st Design Automation Conference, 2004..

[7]  Vikas Chandra Quantifying workload dependent reliability in embedded processors , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).

[8]  Nacho Navarro,et al.  FELI: HW/SW Support for On-Chip Distributed Shared Memory in Multicores , 2011, Euro-Par.

[9]  Alexandre M. Amory,et al.  Multi-task dynamic mapping onto NoC-based MPSoCs , 2011, SBCCI '11.

[10]  Stephen P. Boyd,et al.  Temperature-aware processor frequency assignment for MPSoCs using convex optimization , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[11]  Amit Kumar Singh,et al.  Thermal-aware mapping of streaming applications on 3D Multi-Processor Systems , 2013, The 11th IEEE Symposium on Embedded Systems for Real-time Multimedia.

[12]  Qinru Qiu,et al.  Distributed task migration for thermal management in many-core systems , 2010, Design Automation Conference.

[13]  Bharadwaj Veeravalli,et al.  Communication and migration energy aware task mapping for reliable multiprocessor systems , 2014, Future Gener. Comput. Syst..

[14]  Qiang Xu,et al.  Lifetime reliability-aware task allocation and scheduling for MPSoC platforms , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[15]  Bharadwaj Veeravalli,et al.  Reliability-driven task mapping for lifetime extension of networks-on-chip based multiprocessor systems , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[16]  Bharadwaj Veeravalli,et al.  Run-time mapping for reliable many-cores based on energy/performance trade-offs , 2013, 2013 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS).

[17]  Bharadwaj Veeravalli,et al.  Temperature aware energy-reliability trade-offs for mapping of throughput-constrained applications on multimedia MPSoCs , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[18]  Fernando Gehm Moraes,et al.  HeMPS - a framework for NoC-based MPSoC generation , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[19]  Tajana Simunic,et al.  Distributed thermal management for embedded heterogeneous MPSoCs with dedicated hardware accelerators , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[20]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[21]  Yusuf Leblebici,et al.  Dynamic thermal management in 3D multicore architectures , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[22]  Sheldon X.-D. Tan,et al.  Task Migrations for Distributed Thermal Management Considering Transient Effects , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[23]  Donald E. Thomas,et al.  Lifetime improvement through runtime wear-based task mapping , 2012, CODES+ISSS '12.

[24]  Xiaobo Sharon Hu,et al.  Enhancing multicore reliability through wear compensation in online assignment and scheduling , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[25]  Chao Chen,et al.  System-level reliability exploration framework for heterogeneous MPSoC , 2014, GLSVLSI '14.

[26]  Amit Kumar Singh,et al.  Accelerating throughput-aware runtime mapping for heterogeneous MPSoCs , 2013, TODE.