Improvement of memory bandwidth utilization using OpenMP task with processor affinity

The CPU design has been evolving for more than 30 years since the first x86 microprocessor. Recently, instead of increasing the CPU performance, the focus has shifted to multi-core architecture. Multi-core processor technology is rapidly evolving, but the memory interface is a limiting factor in fulfilling the needs of multi-core and multi-threaded processors. This is a big challenge for software developers. The run time thread is dynamically allocated to each processor core by the scheduler of the operating system. Current parallel programming researches only aim to load balance and keep the multi-core running efficiently. As a result, applications may have poor spatial data locality. This will also cause uneven memory bandwidth usage due to differences in memory access paths. The question of obtaining maximum memory bandwidth utilization by controlling the thread of a processor affinity is the main scope of this particular research. Memory bandwidth utilization of 62% (8786.87 MB/s to 14201.88 MB/s) was achieved, if appropriate processor affinity was set for thread placement. The OpenMP task level parallelism in addition to processor affinity resulted in 69% (8786.87 MB/s to 14802.69 MB/s) of improvement using 2 threads. Thus, task level parallelism combined with processor affinity greatly increases the level of parallelism in an OpenMP parallel programming environment. As a result, it can improve the overall performance of parallel applications.

[1]  Nectarios Koziris,et al.  Memory bandwidth aware scheduling for SMP cluster nodes , 2005, 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[2]  Evangelos P. Markatos,et al.  Load Balancing vs. Locality Management in Shared-Memory Multiprocessors , 1992, ICPP.

[3]  Azzedine Boukerche,et al.  A design aid and real-time measurement framework for Virtual collaborative simulation Environment , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[4]  Xin Yuan,et al.  Processor affinity and MPI performance on SMP-CMP clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[5]  Dirk Schmidl,et al.  Data and thread affinity in openmp programs , 2008, MAW '08.

[6]  K. L. Shunmuganathan,et al.  A Novel Hard-Soft Processor Affinity Scheduling for Multicore Architecture using Multiagents , 2011 .

[7]  Nectarios Koziris,et al.  Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of SMPs , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[8]  Quan Chen,et al.  CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures , 2012, ICS '12.