Using OpenMP superscalar for parallelization of embedded and consumer applications

In the past years, research and industry have introduced several parallel programming models to simplify the development of parallel applications. A popular class among these models are task-based programming models which proclaim ease-of-use, portability, and high performance. A novel model in this class, OpenMP Superscalar, combines advanced features such as automated runtime dependency resolution, while maintaining simple pragma-based programming for C/C++. OpenMP Superscalar has proven to be effective in leveraging parallelism in HPC workloads. Embedded and consumer applications, however, are currently still mainly parallelized using traditional thread-based programming models. In this work, we investigate how effective OpenMP Superscalar is for embedded and consumer applications in terms of usability and performance. To determine the usability of OmpSs, we show in detail how to implement complex parallelization strategies such as ones used in parallel H.264 decoding. To evaluate the performance we created a collection of ten embedded and consumer benchmarks parallelized in both OmpSs and Pthreads.

[1]  Openmp: a Proposed Industry Standard Api for Shared Memory Programming , 2022 .

[2]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[3]  Rosa M. Badia,et al.  A Flexible and Portable Programming Model for SMP and Multi-cores BSC-UPC COMPUTER SCIENCES PROGRAM , 2007 .

[4]  Alejandro Duran,et al.  Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP , 2009, 2009 International Conference on Parallel Processing.

[5]  Keith H. Randall,et al.  Cilk: efficient multithreaded computing , 1998 .

[6]  Srikar Chowdary Ravela Comparison of Shared memory based parallel programming models , 2010 .

[7]  James Reinders,et al.  Intel® threading building blocks , 2008 .

[8]  Michael D. McCool,et al.  Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[9]  J. Xu OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .

[10]  Thomas Ertl,et al.  CUDASA: Compute Unified Device and Systems Architecture , 2008, EGPGV@Eurographics.

[11]  Ben H. H. Juurlink,et al.  A Case for Hardware Task Management Support for the StarSS Programming Model , 2010, 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools.

[12]  Albert Cohen,et al.  A Stream-Comptuting Extension to OpenMP , 2010, IWOMP 2010.

[13]  Ulrich Drepper,et al.  The Native POSIX Thread Library for Linux , 2002 .

[14]  Eduard Ayguadé,et al.  Nanos mercurium: A research compiler for OpenMP , 2004 .

[15]  Paul M. Carpenter,et al.  Starsscheck: A Tool to Find Errors in Task-Based Parallel Programs , 2010, Euro-Par.

[16]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[17]  Polyvios Pratikakis,et al.  Parallel Programming of General-Purpose Programs Using Task-Based Programming Models , 2011, HotPar.

[18]  Chi Ching Chi,et al.  A Benchmark Suite for Evaluating Parallel Programming Models: Introduction and Preliminary Results , 2011 .

[19]  Chi Ching Chi,et al.  A Benchmark Suite for Evaluating Parallel Programming Models , 2011 .

[20]  David Abrahams,et al.  THE BOOST C++ METAPROGRAMMING LIBRARY , 2002 .

[21]  Eduard Ayguadé,et al.  Task Superscalar: An Out-of-Order Task Pipeline , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[22]  Christian Bienia,et al.  PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors , 2009 .

[23]  Jesús Labarta,et al.  A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.

[24]  Mats Brorsson,et al.  A Comparison of some recent Task-based Parallel Programming Models , 2010 .

[25]  Albert Cohen,et al.  A stream-computing extension to OpenMP , 2011, HiPEAC.