The Impact of Parallel Programming Interfaces on the Aging of a Multicore Embedded Processor

In order to meet the increasing performance demand of applications, the amount of cores in a single chip package has been increasing. However, the heat has been rising at a higher scale, which accelerates the aging process in modern processors. Therefore, wisely balancing the use of resources is important to extend its longevity. Frequency performance stagnates after a certain amount of concurrent threads starts executing. In such cases, the only result is a temperature rise that directly influences the aging process, reducing the processor lifetime. This unbalance between threads can be originated from many factors, which includes the way threads communicate and synchronize. Considering that those characteristics are related to the Parallel Programming Interface (PPI) used to parallelize the application, this work proposes to evaluate three widely used PPIs executing on an embedded multicore. We show that, depending on the characteristic of the application, by only switching from one PPI to another, it is possible to reduce the effects of aging. For that, we have developed a model based on the Arrhenius equation. We show that OpenMP has a lower impact on the processor aging for memory-bound applications: up to 38% and 68% lower than PThreads and MPI, respectively. On the other hand, PThreads presents the lowest impact on the processor aging for CPU-bound applications.

[1]  Laurent Lefèvre,et al.  Performance and energy analysis of OpenMP runtime systems with dense linear algebra algorithms , 2019, Int. J. High Perform. Comput. Appl..

[2]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[3]  Antonio Carlos Schneider Beck,et al.  Investigating different general-purpose and embedded multicores to achieve optimal trade-offs between performance and energy , 2016, J. Parallel Distributed Comput..

[4]  Richard W. Vuduc,et al.  Performance evaluation of concurrent collections on high-performance multicore computing systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[5]  Yonghong Yan,et al.  Comparison of Threading Programming Models , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[6]  Muhammad Shafique,et al.  The EDA challenges in the dark silicon era , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[7]  William Fornaciari,et al.  NBTI-aware design of NoC buffers , 2013, IMA-OCMC '13.

[8]  Partha Pratim Pande,et al.  Power efficiency in high performance computing , 2012 .

[9]  Antonio Carlos Schneider Beck,et al.  Performance and Energy Evaluation of Different Multi-Threading Interfaces in Embedded and General Purpose Systems , 2015, J. Signal Process. Syst..

[10]  Barbara Chapman,et al.  Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .

[11]  Emilio Luque,et al.  Impact of parallel programming models and CPUs clock frequency on energy consumption of HPC systems , 2011, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA).

[12]  Antonio Carlos Schneider Beck,et al.  Aurora: Seamless Optimization of OpenMP Applications , 2019, IEEE Transactions on Parallel and Distributed Systems.

[13]  Luca Benini,et al.  Thermal and Energy Management of High-Performance Multicores: Distributed and Self-Calibrating Model-Predictive Controller , 2013, IEEE Transactions on Parallel and Distributed Systems.

[14]  Luca Benini,et al.  A distributed and self-calibrating model-predictive controller for energy and thermal management of high-performance multicores , 2011, 2011 Design, Automation & Test in Europe.

[15]  David R. Butenhof Programming with POSIX threads , 1993 .

[16]  Bo Wang,et al.  Evaluating the Energy Consumption of OpenMP Applications on Haswell Processors , 2015, IWOMP.

[17]  Joseph H. Flynn,et al.  A quick, direct method for the determination of activation energy from thermogravimetric data , 1966 .

[18]  Luca Benini,et al.  Thermal Balancing Policy for Multiprocessor Stream Computing Platforms , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19]  Antonio Carlos Schneider Beck,et al.  On the influence of static power consumption in multicore embedded systems , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).

[20]  Kurt Fellows A comparative study of the effects of parallelization on ARM and Intel based platforms , 2014 .

[21]  Shen Hua,et al.  Comparison and Analysis of Parallel Computing Performance Using OpenMP and MPI , 2013 .

[22]  Michael Lang,et al.  HPC runtime support for fast and power efficient locking and synchronization , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[23]  Nikil D. Dutt,et al.  ARGO: Aging-aware GPGPU register file allocation , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[24]  Luca Benini,et al.  OpenMP Support for NBTI-Induced Aging Tolerance in MPSoCs , 2009, SSS.

[25]  William Fornaciari,et al.  NBTI mitigation in microprocessor designs , 2012, GLSVLSI '12.

[26]  Simone Formentin,et al.  A constrained extremum-seeking control for CPU thermal management , 2018, CF.

[27]  Jack Dongarra,et al.  MPI - The Complete Reference: Volume 1, The MPI Core , 1998 .

[28]  Michael Mikolajczak,et al.  Designing And Building Parallel Programs: Concepts And Tools For Parallel Software Engineering , 1997, IEEE Concurrency.

[29]  M. White Microelectronics reliability : physics-of-failure based modeling and lifetime evaluation , 2008 .

[30]  Juan Touriño,et al.  Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures , 2009, PVM/MPI.

[31]  Xiaobo Sharon Hu,et al.  Temperature-Aware Scheduling and Assignment for Hard Real-Time Applications on MPSoCs , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.