An Adaptive OpenMP Loop Scheduler for Hyperthreaded SMPs

Hyperthreaded (HT) and simultaneous multithreaded (SMT) processors are now available in commodity workstations and servers. This technology is designed to increase throughput by executing multiple concurrent threads on a single physical processor. These multiple threads share the processor’s functional units and on-chip memory hierarchy in an attempt to make better use of idle resources. This work focuses on tuning the behavior of OpenMP applications executing on SMPs with SMT processors. We propose a self-tuning OpenMP loop scheduler designed to react to behavior caused by inter-thread data locality, instruction mix and SMT-related load imbalance. This adaptive loop scheduler automatically selects the number of threads that should be used for each parallel loop and a good scheduling policy for the iterations. It is shown that this scheduler outperforms all other OpenMP schedulers, and because it can dynamically select the number of threads to use for each region, it even outperforms the best combination of runtime schedulers for any fixed number of threads.

[1]  L.M. Ni,et al.  Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers , 1993, IEEE Trans. Parallel Distributed Syst..

[2]  Michael Voss,et al.  Reducing parallel overheads through dynamic serialization , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[3]  D. Marr,et al.  Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .

[4]  Margaret Martonosi,et al.  Adaptive parallelism in compiler‐parallelized code , 1998 .

[5]  Alejandro Duran,et al.  Is the Schedule Clause Really Necessary in OpenMP? , 2003, WOMPAT.

[6]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[7]  Rudolf Eigenmann,et al.  SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.

[8]  Evangelos P. Markatos,et al.  Using processor affinity in loop scheduling on shared-memory multiprocessors , 1992, Proceedings Supercomputing '92.