论文信息 - Memory Affinity for Hierarchical Shared Memory Multiprocessors

Memory Affinity for Hierarchical Shared Memory Multiprocessors

Currently, parallel platforms based on large scale hierarchical shared memory multiprocessors with Non-Uniform Memory Access (NUMA) are becoming a trend in scientific High Performance Computing (HPC). Due to their memory access constraints, these platforms require a very careful data distribution. Many solutions were proposed to resolve this issue. However, most of these solutions did not include optimizations for numerical scientific data (array data structures) and portability issues. Besides, these solutions provide a restrict set of memory policies to deal with data placement. In this paper, we describe an user-level interface named Memory Affinity interface (MAi), which allows memory affinity control on Linux based cache-coherent NUMA (ccNUMA) platforms. Its main goals are, fine data control, flexibility and portability. The performance of MAi is evaluated on three ccNUMA platforms using numerical scientific HPC applications, the NAS Parallel Benchmarks and a Geophysics application. The results show important gains (up to 31\%) when compared to Linux default solution.

Jean-François Méhaut | Luiz Gustavo Fernandes | Márcio Bastos Castro | Christiane Pousa Ribeiro | Alexandre Carissimi

[1] Joseph Antony,et al. Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport , 2006, HiPC.

[2] Brice Goglin,et al. Enabling high-performance memory migration for multithreaded applications on LINUX , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[3] Jean-François Méhaut,et al. Explorando Afinidade de Memória em Arquiteturas NUMA , 2008, Anais do IX Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD 2008).

[4] Jean-François Méhaut,et al. NUMA-ICTM: A parallel version of ICTM exploiting memory placement strategies for NUMA machines , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[5] Jonathan Harris,et al. Extending OpenMP For NUMA Machines , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[6] Dirk Schmidl,et al. Data and thread affinity in openmp programs , 2008, MAW '08.

[7] Laxmi N. Bhuyan,et al. Design and analysis of static memory management policies for CC-NUMA multiprocessors , 2002, J. Syst. Archit..

[8] Sverker Holmgren,et al. affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system , 2005, ICS '05.

[9] Frank Bellosa. Memory Conscious Scheduling and Processor Allocation on NUMA Architectures , 1995 .

[10] The Performance Implications of Locality Information Usage in Shared-Memory . . . , 1996 .

[11] Graçaliz Pereira Dimuro,et al. ICTM: An Interval Tessellation-Based Model for Reliable Topographic Segmentation , 2004, Numerical Algorithms.