Towards Robust Data-Driven Parallel Loop Scheduling Using Bayesian Optimization

Efficient parallelization of loops is critical to improving the performance of high-performance computing applications. Many classical parallel loop scheduling algorithms have been developed to increase parallelization efficiency. Recently, workload-aware methods were developed to exploit the structure of workloads. However, both classical and workload-aware scheduling methods lack what we call robustness. That is, most of these scheduling algorithms tend to be unpredictable in terms of performance or have specific workload patterns they favor. This causes application developers to spend additional efforts in finding the best suited algorithm or tune scheduling parameters. This paper proposes Bayesian Optimization augmented Factoring Self-Scheduling (BO FSS), a robust data-driven parallel loop scheduling algorithm. BO FSS is powered by Bayesian Optimization (BO), a machine learning based optimization algorithm. We augment a classical scheduling algorithm, Factoring Self-Scheduling (FSS), into a robust adaptive method that will automatically adapt to a wide range of workloads. To compare the performance and robustness of our method, we have implemented BO FSS and other loop scheduling methods on the OpenMP framework. A regret-based metric called performance regret is also used to quantify robustness. Extensive benchmarking results show that BO FSS performs fairly well in most workload patterns and is also very robust relative to other scheduling methods. BO FSS achieves an average of 4% performance regret. This means that even when BO FSS is not the best performing algorithm on a specific workload, it stays within a 4 percentage points margin of the best performing algorithm.

[1]  Richard M. Stallman,et al.  Using the GNU Compiler Collection , 2010 .

[2]  Jean-François Méhaut,et al.  A comprehensive performance evaluation of the BinLPT workload‐aware loop scheduler , 2019, Concurr. Comput. Pract. Exp..

[3]  Guilherme Ottoni,et al.  Constrained Bayesian Optimization with Noisy Experiments , 2017, Bayesian Analysis.

[4]  Minlan Yu,et al.  CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[5]  Kevin Skadron,et al.  A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[6]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[7]  L.M. Ni,et al.  Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers , 1993, IEEE Trans. Parallel Distributed Syst..

[8]  Dannie Durand,et al.  Impact of Memory Contention on Dynamic Scheduling on Numa Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[9]  Ulrich Rüde,et al.  Expression Templates Revisited: A Performance Analysis of Current Methodologies , 2011, SIAM J. Sci. Comput..

[10]  Edith Schonberg,et al.  Factoring: a method for scheduling parallel loops , 1992 .

[11]  Robert B. Gramacy,et al.  Particle Learning of Gaussian Process Models for Sequential Design and Optimization , 2009, 0909.5262.

[12]  Ioana Banicescu,et al.  Load balancing highly irregular computations with the adaptive factoring , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[13]  Andrea Castelletti,et al.  Robustness Metrics: How Are They Calculated, When Should They Be Used and Why Do They Give Different Results? , 2018 .

[14]  Florina M. Ciorba,et al.  OpenMP Loop Scheduling Revisited: Making a Case for More Schedules , 2018, IWOMP.

[15]  D. Sculley,et al.  Bayesian Optimization for a Better Dessert , 2017 .

[16]  Steven Lucco,et al.  A dynamic scheduling method for irregular parallel programs , 1992, PLDI '92.

[17]  J. M. Bull,et al.  Measuring Synchronisation and Scheduling Overheads in OpenMP , 2007 .

[18]  J. Ramanujam,et al.  HPX Smart Executors , 2017, ESPM2@SC.

[19]  Jean-François Méhaut,et al.  BinLPT: A Novel Workload-Aware Loop Scheduler for Irregular Parallel Loops , 2017 .

[20]  Michael F. P. O'Boyle,et al.  Mapping parallelism to multi-cores: a machine learning based approach , 2009, PPoPP '09.

[21]  Keith R. Dalbey,et al.  Gaussian Process Adaptive Importance Sampling. , 2014 .

[22]  David J. Lilja,et al.  Parallel Loop Scheduling for High Performance Computers , 1995 .

[23]  Torben Hagerup Allocating Independent Tasks to Parallel Processors: An Experimental Study , 1996, IRREGULAR.

[24]  Hannah Bast Provably optimal scheduling of similar tasks , 2000 .

[25]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[26]  Leonard J. Savage,et al.  The Theory of Statistical Decision , 1951 .

[27]  Brahim Chaib-draa,et al.  A Marginalized Particle Gaussian Process Regression , 2012, NIPS.

[28]  Simon J. Godsill,et al.  An Overview of Existing Methods and Recent Advances in Sequential Monte Carlo , 2007, Proceedings of the IEEE.

[29]  Alexandru Nicolau,et al.  History-aware Self-Scheduling , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[30]  Jos'e Miguel Hern'andez-Lobato,et al.  Constrained Bayesian Optimization for Automatic Chemical Design , 2017 .

[31]  Alan Weiss,et al.  Allocating Independent Subtasks on Parallel Processors , 1985, IEEE Transactions on Software Engineering.

[32]  H. Bast,et al.  On Scheduling Parallel Tasks at Twilight , 2000, Theory of Computing Systems.