Scheduling algorithms with bus bandwidth considerations for SMPs

The bus that connects processors to memory is known to be a major architectural bottleneck in SMPs. However, both software and scheduling policies for these systems generally focus on memory hierarchy optimizations and do not address the bus bandwidth limitations directly. We first present experimental results which indicate that bus saturation can cause an up to almost three-fold slowdown to applications. Motivated by these results, we introduce two scheduling policies that take into account the bus bandwidth consumption of applications. The necessary information is provided by performance monitoring counters which are present in all modern processors. Our algorithms organize jobs so that processes with high-bandwidth and low-bandwidth demands are co-scheduled to improve bus bandwidth utilization without saturating the bus. We found that our scheduler is effective with applications of varying bandwidth requirements, from very low to close to the limit of saturation. We also tuned our scheduler for robustness in the presence of bursts of high bus bandwidth consumption from individual jobs. The new scheduling policies improve system throughput by up to 68% (26% in average) in comparison with the standard Linux scheduler

[1]  Eleftherios D. Polychronopoulos,et al.  A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU MANAGER , 2000, JSSPP.

[2]  Raj Vaswani,et al.  The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors , 1991, SOSP '91.

[3]  Proceedings 2003 International Conference on Parallel Processing , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[4]  Jack J. Dongarra,et al.  End-user Tools for Application Performance Analysis Using Hardware Counters , 2001, ISCA PDCS.

[5]  G. Edward Suh,et al.  Effects of Memory Performance on Parallel Job Scheduling , 2001, JSSPP.

[6]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[7]  Josep Torrellas,et al.  Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors , 1995, J. Parallel Distributed Comput..

[8]  David A. Patterson,et al.  Performance characterization of a Quad Pentium Pro SMP using OLTP workloads , 1998, ISCA.

[9]  Nancy M. Amato,et al.  Predicting performance on SMPs. A case study: the SGI Power Challenge , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[10]  G. Edward Suh,et al.  Analytical cache models with applications to cache partitioning , 2001, ICS '01.

[11]  Mark S. Squillante,et al.  Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling , 1993, IEEE Trans. Parallel Distributed Syst..

[12]  David J. Lilja,et al.  An Effective Processor Allocation Strategy for Multiprogrammed Shared-Memory Multiprocessors , 1997, IEEE Trans. Parallel Distributed Syst..

[13]  Ashok Singhal,et al.  Architectural support for performance tuning: a case study on the SPARCcenter 2000 , 1994, ISCA '94.

[14]  Simon Kahan,et al.  Scheduling on the Tera MTA , 1995, JSSPP.

[15]  Thu D. Nguyen,et al.  Maximizing speedup through self-tuning of processor allocation , 1996, Proceedings of International Conference on Parallel Processing.

[16]  Anoop Gupta,et al.  Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.

[17]  S. Turner,et al.  Performance Analysis Using the MIPS R10000 Performance Counters , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[18]  Raj Vaswani,et al.  A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors , 1993, TOCS.

[19]  A. Goldberg,et al.  Architectural support for performance tuning: a case study on the SPARCcenter 2000 , 1994, Proceedings of 21 International Symposium on Computer Architecture.