Optimized Use of Parallel Programming Interfaces in Multithreaded Embedded Architectures

Thread-level parallelism (TLP) exploitation for embedded systems has been a challenge for software developers: while it is necessary to take advantage of the availability of multiple cores, it is also mandatory to consume less energy. To speed up the development process and make it as transparent as possible, software designers use parallel programming interfaces (PPIs). However, as will be shown in this paper, each one implements different ways to exchange data, influencing performance, energy consumption and energy-delay product (EDP), which varies across different embedded processors. By evaluating four PPIs and three multicore processors, we demonstrate that it is possible to save up to 62% in energy consumption and achieve up to 88% of EDP improvements by just switching the PPI, and that the efficiency (i.e., The best possible use of the available resources) decreases as the number of threads increases in almost all cases, but at distinct rates.

[1]  Krisztián Flautner,et al.  A study of Thread Level Parallelism on mobile devices , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[2]  Michael Lang,et al.  HPC runtime support for fast and power efficient locking and synchronization , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[3]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[4]  Mitsuhisa Sato,et al.  Evaluation of Multicore Processors for Embedded Systems by Parallel Benchmark Program Using OpenMP , 2009, IWOMP.

[5]  Yale N. Patt,et al.  Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs , 2008, ASPLOS.

[6]  Karthikeyan Sankaralingam,et al.  Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[7]  Emilio Luque,et al.  Impact of parallel programming models and CPUs clock frequency on energy consumption of HPC systems , 2011, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA).

[8]  Luigi Carro,et al.  Adaptable Embedded Systems , 2012 .

[9]  Robert B. Ross,et al.  Using MPI-2: Advanced Features of the Message Passing Interface , 2003, CLUSTER.

[10]  Antonio Carlos Schneider Beck,et al.  Performance and Energy Evaluation of Different Multi-Threading Interfaces in Embedded and General Purpose Systems , 2015, J. Signal Process. Syst..

[11]  Richard W. Vuduc,et al.  Performance evaluation of concurrent collections on high-performance multicore computing systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[12]  Jean-François Méhaut,et al.  Performance analysis of HPC applications on low-power embedded platforms , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[13]  Tiziano De Matteis,et al.  Optimizing message-passing on multicore architectures using hardware multi-threading , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[14]  Tae Houn Song,et al.  OpenMP parallel programming using dual-core embedded system , 2011, 2011 11th International Conference on Control, Automation and Systems.

[15]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[16]  Gul A. Agha,et al.  Towards optimizing energy costs of algorithms for shared memory architectures , 2010, SPAA '10.

[17]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[18]  Stephen L. Olivier,et al.  Power Measurement and Concurrency Throttling for Energy Reduction in OpenMP Programs , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[19]  Barbara Chapman,et al.  Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.

[20]  David R. Butenhof Programming with POSIX threads , 1993 .

[21]  Andrew S. Tanenbaum,et al.  Operating systems: design and implementation , 1987, Prentice-Hall software series.

[22]  Michael Mikolajczak,et al.  Designing And Building Parallel Programs: Concepts And Tools For Parallel Software Engineering , 1997, IEEE Concurrency.