Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments

Matching program parallelism to platform parallelism using thread selection is difficult when the environment and available resources dynamically change. Existing compiler or runtime approaches are typically based on a one-size fits all policy. There is little ability to either evaluate or adapt the policy when encountering new external workloads or hardware resources. This paper focuses on selecting the best number of threads for a parallel application in dynamic environments. It develops a new scheme based on a mixture of experts approach. It learns online which, of a number of existing policies, or experts, is best suited for a particular environment without having to try out each policy. It does this by using a novel environment predictor as a proxy for the quality of an expert thread selection policy. Additional expert policies can easily be added and are selected only when appropriate. We evaluate our scheme in environments with varying external workloads and hardware resources.We then consider the case when workloads use affinity scheduling or are themselves adaptive and show that our approach, in all cases, outperforms existing schemes and surprisingly improves workload performance. On average, we improve 1.66x over OpenMP default, 1.34x over an online scheme, 1.25x over an offline policy and 1.2x over a state-of-art analytic model. Determining the right number and type of experts is an open problem and our initial analysis shows that adding more experts improves accuracy and performance.

[1]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[2]  Henry Hoffmann,et al.  Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.

[3]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[4]  Nathan Clark,et al.  Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications , 2010, ISCA.

[5]  Andreas Zeller,et al.  Sambamba: A Runtime System for Online Adaptive Parallelization , 2012, CC.

[6]  Gurindar S. Sohi,et al.  Adaptive, efficient, parallel execution of parallel programs , 2014, PLDI.

[7]  Koby Crammer,et al.  Optimal Resource Allocation with Semi-Bandit Feedback , 2014, UAI.

[8]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[9]  Alan Edelman,et al.  Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[10]  Henry Hoffmann,et al.  A generalized software framework for accurate and efficient management of performance goals , 2013, 2013 Proceedings of the International Conference on Embedded Software (EMSOFT).

[11]  James Reinders,et al.  Intel® threading building blocks , 2008 .

[12]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[13]  Wei Wang,et al.  ReSense: Mapping dynamic workloads of colocated multithreaded applications using resource sensitivity , 2013, ACM Trans. Archit. Code Optim..

[14]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[15]  Nicholas Carriero,et al.  Adaptive Parallelism and Piranha , 1995, Computer.

[16]  Michael F. P. O'Boyle,et al.  Smart, adaptive mapping of parallelism in the presence of external workload , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[17]  Lingjia Tang,et al.  The impact of memory subsystem resource sharing on datacenter applications , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[18]  Henry Hoffmann,et al.  SEEC: A Framework for Self-aware Computing , 2010 .

[19]  Michael F. P. O'Boyle,et al.  Adaptive java optimisation using instance-based learning , 2004, ICS '04.

[20]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[21]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[22]  Narayanan Unny Edakunni,et al.  Boosting as a Product of Experts , 2011, UAI.

[23]  Sotiris Ioannidis,et al.  Compiler and Run-Time Support for Adaptive Load Balancing in Software Distributed Shared Memory Systems , 1998, LCR.

[24]  Henry Hoffmann,et al.  CoAdapt: Predictable Behavior for Accuracy-Aware Applications Running on Power-Aware Systems , 2014, 2014 26th Euromicro Conference on Real-Time Systems.

[25]  Dimitrios S. Nikolopoulos,et al.  Online power-performance adaptation of multithreaded programs using hardware event-based prediction , 2006, ICS '06.

[26]  Arun Raman,et al.  Parallelism orchestration using DoPE: the degree of parallelism executive , 2011, PLDI '11.

[27]  Yale N. Patt,et al.  Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs , 2008, ASPLOS.

[28]  Ayal Zaks,et al.  Parcae: a system for flexible parallel execution , 2012, PLDI.

[29]  Gurindar S. Sohi,et al.  Holistic run-time parallelism management for time and energy efficiency , 2013, ICS '13.