Juggle: proactive load balancing on multicore computers

We investigate proactive dynamic load balancing on multicore systems, in which threads are continually migrated to reduce the impact of processor/thread mismatches to enhance the flexibility of the SPMD-style programming model, and enable SPMD applications to run efficiently in multiprogrammed environments. We present Juggle, a practical decentralized, user-space implementation of a proactive load balancer that emphasizes portability and usability. Juggle shows performance improvements of up to 80% over static balancing for UPC, OpenMP, and pthreads benchmarks. We analyze the impact of Juggle on parallel applications and derive lower bounds and approximations for thread completion times. We show that results from Juggle closely match theoretical predictions across a variety of architectures, including NUMA and hyper-threaded systems. We also show that Juggle is effective in multiprogrammed environments with unpredictable interference from unrelated external applications.

[1]  Anthony P. Reeves,et al.  Strategies for Dynamic Load Balancing on Highly Parallel Computers , 1993, IEEE Trans. Parallel Distributed Syst..

[2]  Tong Li,et al.  Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin , 2009, PPoPP '09.

[3]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[4]  Stephen L. Olivier,et al.  Scalable Dynamic Load Balancing Using UPC , 2008, 2008 37th International Conference on Parallel Processing.

[5]  Cyril Fonlupt,et al.  Data-Parallel Load Balancing Strategies , 1998, Parallel Comput..

[6]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[7]  Francisco J. Cazorla,et al.  A dynamic scheduler for balancing HPC applications , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Ana Cortés,et al.  The Convergence of Realistic Distributed Load-Balancing Algorithms , 2006, Theory of Computing Systems.

[9]  Steven A. Hofmeyr,et al.  Load balancing on speed , 2010, PPoPP '10.

[10]  Celso C. Ribeiro,et al.  Developing SPMD applications with load balancing , 2003, Parallel Comput..

[11]  Alexey Kukanov,et al.  The Foundations for Scalable Multicore Software in Intel Threading Building Blocks , 2007 .

[12]  J. Fier,et al.  Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[13]  Steven A. Hofmeyr,et al.  Oversubscription on multicore processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[14]  Larry Rudolph,et al.  Gang Scheduling Performance Benefits for Fine-Grain Synchronization , 1992, J. Parallel Distributed Comput..

[15]  Katherine Yelick,et al.  Optimizing collective communication on multicores , 2009 .

[16]  Robert D. Blumofe,et al.  The performance of work stealing in multiprogrammed environments (extended abstract) , 1998, SIGMETRICS '98/PERFORMANCE '98.

[17]  Jeff Roberson,et al.  ULE: A Modern Scheduler for FreeBSD , 2003, BSDCon.

[18]  Anoop Gupta,et al.  The impact of operating system scheduling policies and synchronization methods of performance of parallel applications , 1991, SIGMETRICS '91.

[19]  Francis C. M. Lau,et al.  Load balancing in parallel computers - theory and practice , 1996, The Kluwer international series in engineering and computer science.