The Performance Limits of Locality Information Usage in Shared-Memory Multiprocessors

Large caches used in scalable shared-memory architectures can avoid high memory access time only if data is referenced within the address scope of the cache. Consequently, locality is the key issue in multiprocessor performance. While CPU utilization still determines scheduling decisions of contemporary schedulers, we propose novel scheduling policies based on locality information derived from cache miss counters. A locality-conscious scheduler can reduce the costs for reloading the cache after each context switch. Thus, the potential benefit of using locality information increases with the frequency of scheduling decisions. Lightweight threads have become a common abstraction in the field of programming languages and operating systems. User-lever schedulers make frequent context switches affordable and therefore draw most profit from the usage of locality information if the lifetime of cachelines exceeds scheduling cycles. This paper examines the performance implications of locality information usage in thread scheduling algorithms for scalable shared-memory multiprocessors. A prototype implementation shows that a locality-conscious scheduler outperforms approaches ignoring locality information.

[1]  Ulrich Rüde On the Multilevel Adaptive Iterative Method , 1994, SIAM J. Sci. Comput..

[2]  Patrick Sobalvarro,et al.  Demand-Based Coscheduling of Parallel Jobs on Multiprogrammed Multiprocessors , 1995, JSSPP.

[3]  Harold S. Stone,et al.  Footprints in the cache , 1987, TOCS.

[4]  Josep Torrellas,et al.  Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors , 1995, J. Parallel Distributed Comput..

[5]  Mark S. Squillante,et al.  Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling , 1993, IEEE Trans. Parallel Distributed Syst..

[6]  Christoph Koppe,et al.  Sleeping Threads: A Kernel Mechanism for Support of Efficient User Level Threads , 1995, Parallel and Distributed Computing and Systems.

[7]  Anant Agarwal,et al.  Waiting algorithms for synchronization in large-scale multiprocessors , 1993, TOCS.

[8]  Henry M. Levy,et al.  The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors , 1989, IEEE Trans. Computers.

[9]  Evangelos P. Markatos,et al.  First-class user-level threads , 1991, SOSP '91.

[10]  Nawaf Bitar,et al.  A Scalable Multi-Discipline, Multiple-Processor Scheduling Framework for IRIX , 1995, JSSPP.

[11]  Andrew Gilliam Tucker,et al.  Efficient Scheduling on Multiprogrammed Shared-Memory Multiprocessors , 1994 .

[12]  Samuel J. Leffler,et al.  The design and implementation of the 4.3 BSD Unix operating system , 1991, Addison-Wesley series in computer science.