Task Parallel Models Based on Dynamic Data Placement to Reduce NUMA Effects

NUMA (Non-Uniform Memory Access) multicore computers become popular in scientific and industrial fields due to its scalable memory performance. However, large-scale intensive data computing on NUMA architecture are facing up to the challenges in data locality problems called NUMA effects that are caused by the overhead accesses of cross-node data. Our task parallel model bases on the strategy of dynamic data placement improving system performance by reducing the frequently data access to remote memory and also by keeping load balance between each NUMA domain. The task parallel models involved OpenMP, numactl and libnuma. The evaluation demonstrates that the benchmarks using our task parallel models on a 32-core NUMA computer with various workloads achieve system performance improvement by 50% at least.