Assessing the Performance of the SRR Loop Scheduler with Irregular Workloads

Abstract The input workload of an irregular application must be evenly distributed among its threads to enable cutting-edge performance. To address this need in OpenMP, several loop scheduling strategies were proposed. While having this ever-increasing number of strategies at disposal is helpful, it has become a non-trivial task to select the best one for a particular application. Nevertheless, this challenge becomes easier to be tackled when existing scheduling strategies are extensively evaluated. Therefore, in this paper, we present a performance and scalability evaluation of the recently-proposed loop scheduling strategy named Smart Round-Robin (SRR). To deliver a comprehensive analysis, we coupled a synthetic kernel benchmarking technique with several rigorous statistical tools, and considered OpenMP’s Static and Dynamic loop schedulers as our baselines. Our results unveiled that SRR performs better on irregular applications with symmetric workloads and coarse-grained parallelization, achieving up to 1.9x and 1.5x speedup over OpenMP’s Static and Dynamic schedulers, respectively.

[1]  Brandon M. Malone,et al.  Predicting the Flexibility of Dynamic Loop Scheduling Using an Artificial Neural Network , 2013, 2013 IEEE 12th International Symposium on Parallel and Distributed Computing.

[2]  Brandon M. Malone,et al.  Portfolio-Based Selection of Robust Dynamic Loop Scheduling Algorithms Using Machine Learning , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[3]  Mahmut T. Kandemir,et al.  Locality-aware mapping and scheduling for multicores , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[4]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[5]  Ioana Banicescu,et al.  Towards the Scalability of Dynamic Loop Scheduling Techniques via Discrete Event Simulation , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[6]  Steven Skiena,et al.  The Algorithm Design Manual , 2020, Texts in Computer Science.

[7]  Keshav Pingali,et al.  How much parallelism is there in irregular applications? , 2009, PPoPP '09.

[8]  Allan Porterfield,et al.  OpenMP task scheduling strategies for multicore NUMA systems , 2012, Int. J. High Perform. Comput. Appl..

[9]  Ioana Banicescu,et al.  Analyzing the Robustness of Dynamic Loop Scheduling for Heterogeneous Computing Systems , 2012, 2012 11th International Symposium on Parallel and Distributed Computing.

[10]  Jean-François Méhaut,et al.  Design methodology for workload‐aware loop scheduling strategies based on genetic algorithm and simulation , 2017, Concurr. Comput. Pract. Exp..

[11]  Thomas Fahringer,et al.  Automatic OpenMP Loop Scheduling: A Combined Compiler and Runtime Approach , 2012, IWOMP.

[12]  Bruno Raffin,et al.  An Efficient OpenMP Loop Scheduler for Irregular Applications on Large-Scale NUMA Machines , 2013, IWOMP.

[13]  Shahriar Lotfi,et al.  Parallel loop scheduling using an evolutionary algorithm , 2010, 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE).