A Lightweight OpenMP 4 Run-time for Embedded Systems

OpenMP is increasingly being adopted by current many-core embedded processors to exploit their parallel computation capabilities. Unfortunately, current run-time implementations of the latest specification (v4.0) are not suitable for processors relying on small and fast on-chip memories, due to its memory consumption. This paper proposes an OpenMP4 run-time that reduces the memory consumption while providing the same performance. Our run-time relies on a new compiler pass capable to generate the task dependency graph of OpenMP programs, which is then efficiently stored in memory.

[1]  Sanjoy K. Baruah Improved Multiprocessor Global Schedulability Analysis of Sporadic DAG Task Systems , 2014, 2014 26th Euromicro Conference on Real-Time Systems.

[2]  Luca Benini,et al.  P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[4]  Gurindar S. Sohi,et al.  Task selection for a multiscalar processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[5]  Fernando Magno Quintão Pereira,et al.  A fast and low-overhead technique to secure programs against integer overflows , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[6]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[7]  Eduardo Quiñones,et al.  Timing characterization of OpenMP4 tasking model , 2015, 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[8]  Sebastian Thrun,et al.  Towards fully autonomous driving: Systems and algorithms , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[9]  Sara Royuela,et al.  Compiler analysis for OpenMP tasks correctness , 2015, Conf. Computing Frontiers.

[10]  Alistair P. Rendell,et al.  OpenMP on the Low-Power TI Keystone II ARM/DSP System-on-Chip , 2013, IWOMP.

[11]  Vivek Sarkar,et al.  Compile-time partitioning and scheduling of parallel programs , 1986, SIGPLAN '86.

[12]  Pascal Sainrat,et al.  WCET Analysis of a Parallel 3D Multigrid Solver Executed on the MERASA Multi-Core , 2010, WCET.

[13]  Benoît Dupont de Dinechin,et al.  Time-critical computing on a single-chip massively parallel processor , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[14]  Eduardo Quiñones,et al.  OpenMP and timing predictability: A possible union? , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[15]  Thomas Hérault,et al.  DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[16]  Cheng Wang,et al.  libEOMP: a portable OpenMP runtime library based on MCA APIs for embedded systems , 2013, PMAM '13.

[17]  Dimitrios S. Nikolopoulos,et al.  BDDT: Block-level Dynamic Dependence Analysis for Deterministic Task-Based , 2012 .

[18]  Luca Benini,et al.  Enabling fine-grained OpenMP tasking on tightly-coupled shared memory clusters , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]  Paraskevas Evripidou,et al.  Combining Compile and Run-Time Dependency Resolution in Data-Driven Multithreading , 2011, 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing.

[20]  Nerma Baščelija Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices , 2013 .

[21]  Jesús Labarta,et al.  Parallelizing dense and banded linear algebra libraries using SMPSs , 2009, Concurr. Comput. Pract. Exp..

[22]  Nicholas Nethercote,et al.  "Building Workload Characterization Tools with Valgrind" , 2006, 2006 IEEE International Symposium on Workload Characterization.