论文信息 - OpenMP: Memory, Devices, and Tasks

OpenMP: Memory, Devices, and Tasks

It is crucial to control round-off error propagation in numerical simulations, because they can significantly affect computed results, especially in parallel codes like OpenMP ones. In this paper, we present a new version of the CADNA library that enables the numerical validation of OpenMP codes. With a reasonable cost in terms of execution time, it enables one to estimate which digits in computed results are affected by round-off errors and to detect numerical instabilities that may occur during the execution. The interest of this new OpenMP-enabled CADNA version is shown on various applications, along with performance results on multi-core and many-core (Intel Xeon Phi) architectures.

[1] Ulrike von Luxburg,et al. A tutorial on spectral clustering , 2007, Stat. Comput..

[2] Maged M. Michael,et al. Evaluation of Blue Gene/Q hardware support for transactional memories , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3] Alejandro Duran,et al. Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[4] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[5] James Reinders,et al. Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[6] Dirk Schmidl,et al. How to Reconcile Event-Based Performance Analysis with Tasking in OpenMP , 2010, IWOMP.

[7] Richard D. Hornung,et al. The RAJA Portability Layer: Overview and Status , 2014 .

[8] Yan Liu,et al. A Case for Including Transactions in OpenMP , 2010, IWOMP.

[9] Andrea L. Bertozzi,et al. An MBO Scheme on Graphs for Classification and Image Processing , 2013, SIAM J. Imaging Sci..

[10] Jan Langer,et al. Coarse-Grain Performance Estimator for Heterogeneous Parallel Computing Architectures like Zynq All-Programmable SoC , 2015, ArXiv.

[11] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.

[12] Dieter an Mey,et al. Adding New Dimensions to Performance Analysis Through User-Defined Objects , 2006, IWOMP.

[13] Jeffrey H. Meyerson,et al. The Go Programming Language , 2014, IEEE Softw..

[14] D FalgoutRobert. An Introduction to Algebraic Multigrid , 2006 .

[15] Jitendra Malik,et al. Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Michael F. Spear,et al. Delaunay Triangulation with Transactions and Barriers , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[17] Martin Schulz,et al. What scientific applications can benefit from hardware transactional memory? , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[18] Tamara G. Kolda,et al. An overview of the Trilinos project , 2005, TOMS.

[19] Christopher J. Hughes,et al. Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20] Eduard Ayguadé,et al. Towards Transactional Memory for OpenMP , 2014, IWOMP.

[21] Mechthild Stoer,et al. A simple min-cut algorithm , 1997, JACM.

[22] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .

[23] Bernd Mohr,et al. A Performance Monitoring Interface for OpenMP , 2002 .

[24] Andrea L. Bertozzi,et al. Multi-class Graph Mumford-Shah Model for Plume Detection Using the MBO scheme , 2014, EMMCVPR.

[25] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..

[26] C. A. R. Hoare,et al. Communicating sequential processes , 1978, CACM.

[27] Edsger W. Dijkstra,et al. Cooperating sequential processes , 2002 .

[28] Arjuna Flenner,et al. Diffuse Interface Models on Graphs for Classification of High Dimensional Data , 2012, SIAM Rev..

[29] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[30] A. Bertozzi,et al. $\Gamma$-convergence of graph Ginzburg-Landau functionals , 2012, Advances in Differential Equations.

[31] Xue-Cheng Tai,et al. Global Binary Optimization on Graphs for Classification of High-Dimensional Data , 2015, Journal of Mathematical Imaging and Vision.

[32] Hiroki Honda,et al. OMPCUDA : OpenMP Execution Framework for CUDA Based on Omni OpenMP Compiler , 2010, IWOMP.