Data and thread affinity in openmp programs

The slogan of last year's International Workshop on OpenMP was "A Practical Programming Model for the Multi-Core Era", although OpenMP still is fully hardware architecture agnostic. As a consequence the programmer is left alone with bad performance if threads and data happen to live apart. In this work we examine the programmer's possibilities to improve data and thread affinity in OpenMP programs for several toy applications and present how to apply the lessons learned on larger application codes. We filled a gap by implementing explicit data migration on Linux providing a next touch mechanism.

[1]  Christian Terboven,et al.  Nested Parallelization with OpenMP , 2007, International Journal of Parallel Programming.

[2]  Dieter an Mey,et al.  Performance Evaluation of a Multi-Zone Application in Different OpenMP Approaches , 2008, International Journal of Parallel Programming.

[3]  Christian Terboven,et al.  Parallelization of the C++ Navier-Stokes Solver DROPS with OpenMP , 2005, PARCO.

[4]  Sverker Holmgren,et al.  Dynamic Data Migration for Structured AMR Solvers , 2007, International Journal of Parallel Programming.

[5]  JinHaoqiang,et al.  Performance characteristics of the multi-zone NAS parallel benchmarks , 2006 .

[6]  Eduard Ayguadé,et al.  Is Data Distribution Necessary in OpenMP? , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[7]  Dieter an Mey,et al.  Hybrid Parallelization with Dynamic Thread Balancing on a ccNUMA System , 2006 .

[8]  Dieter an Mey,et al.  Pushing Loop-Level Parallelization to the Limit , 2002 .

[9]  Haoqiang Jin,et al.  Performance characteristics of the multi-zone NAS parallel benchmarks , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..