OpenMP: Memory, Devices, and Tasks

It is crucial to control round-off error propagation in numerical simulations, because they can significantly affect computed results, especially in parallel codes like OpenMP ones. In this paper, we present a new version of the CADNA library that enables the numerical validation of OpenMP codes. With a reasonable cost in terms of execution time, it enables one to estimate which digits in computed results are affected by round-off errors and to detect numerical instabilities that may occur during the execution. The interest of this new OpenMP-enabled CADNA version is shown on various applications, along with performance results on multi-core and many-core (Intel Xeon Phi) architectures.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Maged M. Michael,et al.  Evaluation of Blue Gene/Q hardware support for transactional memories , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Alejandro Duran,et al.  Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[4]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[5]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[6]  Dirk Schmidl,et al.  How to Reconcile Event-Based Performance Analysis with Tasking in OpenMP , 2010, IWOMP.

[7]  Richard D. Hornung,et al.  The RAJA Portability Layer: Overview and Status , 2014 .

[8]  Yan Liu,et al.  A Case for Including Transactions in OpenMP , 2010, IWOMP.

[9]  Andrea L. Bertozzi,et al.  An MBO Scheme on Graphs for Classification and Image Processing , 2013, SIAM J. Imaging Sci..

[10]  Jan Langer,et al.  Coarse-Grain Performance Estimator for Heterogeneous Parallel Computing Architectures like Zynq All-Programmable SoC , 2015, ArXiv.

[11]  Rudolf Eigenmann,et al.  OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.

[12]  Dieter an Mey,et al.  Adding New Dimensions to Performance Analysis Through User-Defined Objects , 2006, IWOMP.

[13]  Jeffrey H. Meyerson,et al.  The Go Programming Language , 2014, IEEE Softw..

[14]  D FalgoutRobert An Introduction to Algebraic Multigrid , 2006 .

[15]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Michael F. Spear,et al.  Delaunay Triangulation with Transactions and Barriers , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[17]  Martin Schulz,et al.  What scientific applications can benefit from hardware transactional memory? , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[19]  Christopher J. Hughes,et al.  Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20]  Eduard Ayguadé,et al.  Towards Transactional Memory for OpenMP , 2014, IWOMP.

[21]  Mechthild Stoer,et al.  A simple min-cut algorithm , 1997, JACM.

[22]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[23]  Bernd Mohr,et al.  A Performance Monitoring Interface for OpenMP , 2002 .

[24]  Andrea L. Bertozzi,et al.  Multi-class Graph Mumford-Shah Model for Plume Detection Using the MBO scheme , 2014, EMMCVPR.

[25]  Alejandro Duran,et al.  Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..

[26]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[27]  Edsger W. Dijkstra,et al.  Cooperating sequential processes , 2002 .

[28]  Arjuna Flenner,et al.  Diffuse Interface Models on Graphs for Classification of High Dimensional Data , 2012, SIAM Rev..

[29]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[30]  A. Bertozzi,et al.  $\Gamma$-convergence of graph Ginzburg-Landau functionals , 2012, Advances in Differential Equations.

[31]  Xue-Cheng Tai,et al.  Global Binary Optimization on Graphs for Classification of High-Dimensional Data , 2015, Journal of Mathematical Imaging and Vision.

[32]  Hiroki Honda,et al.  OMPCUDA : OpenMP Execution Framework for CUDA Based on Omni OpenMP Compiler , 2010, IWOMP.