Three dimensions of scheduling

Three Dimensions of Scheduling by Frank Bellosa The research presented in this dissertation has concentrated on the gathering and usage of memory access patterns in the area of user-level as well as kernel-level scheduling, yet covering both time-sharing and real-time aspects. Beside the questions when a task has to be executed and which CPU should be used, the freedom of scheduling is enlarged to a third dimension, the speed of execution. To alleviate the cache effects perceivable after a context switch, Follow-On Scheduling extracts information gathered by the virtual memory subsystem to identify threads which share many pages. These related threads are scheduled so that they follow upon each other when being executed.The benefit of using memory access patterns on the coarse level of page access lies in the reduction of the number of cache misses a thread experiences after the switch if a related thread has run on the same CPU before. The trade-off between scheduling overhead and performance gain due to better locality of reference favors strategies using memory access patterns on the level of cache access in architectures with non-uniform memory access. A promising strategy using cache-miss information is based on a Markov model to estimate the cost to establish a thread’s footprint in the cache after restarting it. This strategy offers the best process reordering and makes a fine-grained architecture-independent programming style possible. A novel approach, calledProcess Cruise Control, effectively isolates real-time threads from the timing and memory-access characteristics of other threads running on different processing units in a multiprocessor environment. Process Cruise Control avoids the malicious effects of memory preemption by a complete memorybandwidth reservation scheme based on information derived from memory-access counters in the hardware. The execution speed of soft real-time applications is maintained while other applications with high memory demands are throttled in their speed of execution.

[1]  Abraham Silberschatz,et al.  Operating System Concepts , 1983 .

[2]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[3]  Margo I. Seltzer,et al.  Operating system benchmarking in the wake of lmbench: a case study of the performance of NetBSD on the Intel x86 architecture , 1997, SIGMETRICS '97.

[4]  Harold S. Stone,et al.  Footprints in the cache , 1986, SIGMETRICS '86/PERFORMANCE '86.

[5]  Frank Bellosa Memory Conscious Scheduling and Processor Allocation on NUMA Architectures , 1995 .

[6]  FRANK BELLOSA bellosa FOLLOW-ON SCHEDULING USING TLB INFORMATION TO REDUCE CACHE MISSES , 1994 .

[7]  M. Tokoro,et al.  Computational field model: toward a new computing model/methodology for open distributed environment , 1990, [1990] Proceedings. Second IEEE Workshop on Future Trends of Distributed Computing Systems.

[8]  Scott Shenker,et al.  Scheduling for reduced CPU energy , 1994, OSDI '94.

[9]  Dror G. Feitelson,et al.  Job Scheduling in Multiprogrammed Parallel Systems , 1997 .

[10]  William E. Weihl,et al.  Lottery scheduling: flexible proportional-share resource management , 1994, OSDI '94.

[11]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[12]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[13]  Ulrich Rüde On the Multilevel Adaptive Iterative Method , 1994, SIAM J. Sci. Comput..

[14]  Patrick Sobalvarro,et al.  Demand-Based Coscheduling of Parallel Jobs on Multiprogrammed Multiprocessors , 1995, JSSPP.

[15]  David R. Keppel,et al.  Tools and Techniques for Building Fast Portable Threads Packages , 1993 .

[16]  Brian N. Bershad,et al.  Dynamic Page Mapping Policies for Cache Conflict Resolution on Standard Hardware , 1994, OSDI.

[17]  W. Paul,et al.  Computer Architecture , 2000, Springer Berlin Heidelberg.

[18]  Richard Eugene Kessler Analysis of multi-megabyte secondary CPU cache memories , 1992 .

[19]  Frank Bellosa,et al.  Process Cruise Control: Throttling Memory Access in a Soft Real-Time Environment , 1997, SOSP 1997.

[20]  Jeffrey C. Mogul,et al.  The effect of context switches on cache performance , 1991, ASPLOS IV.

[21]  Wei-Chung Hsu,et al.  Data Prefetching On The HP PA-8000 , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[22]  Stefan Savage,et al.  Processor capacity reserves: an abstraction for managing processor usage , 1993, Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III.

[23]  Mark Crovella,et al.  The Search for Lost Cycles: A New Approach to Parallel Program Performance Evaluation , 1993 .

[24]  M. Martonosi,et al.  Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[25]  Robert E. Tarjan,et al.  Relaxed heaps: an alternative to Fibonacci heaps with applications to parallel computation , 1988, CACM.

[26]  Abraham Silberschatz,et al.  Operating System Concepts, 5th Edition , 1994 .

[27]  Herman H. Goldstine,et al.  Preliminary discussion of the logical design of an electronic computing instrument (1946) , 1989 .

[28]  Tien-Fu Chen,et al.  Data prefetching for high-performance processors , 1993 .

[29]  Ragunathan Rajkumar,et al.  Experiences with Processor Reservation and Dynamic QOS in Real-Time Mach , 1996 .

[30]  Wen-mei W. Hwu,et al.  Run-time Adaptive Cache Hierarchy Via Reference Analysis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[31]  Ragunathan Rajkumar,et al.  An interactive interface and RT-Mach support for monitoring and controlling resource management , 1995, Proceedings Real-Time Technology and Applications Symposium.

[32]  I. Stoica,et al.  A New Approach to Implement Proportional Share Resource Allocation , 1995 .

[33]  AgarwalAnant,et al.  Waiting algorithms for synchronization in large-scale multiprocessors , 1993 .

[34]  Dirk Grunwald,et al.  Improving the cache locality of memory allocation , 1993, PLDI '93.

[35]  Ricardo Bianchini,et al.  Exploiting Bandwidth to Reduce Average Memory Access Time in Scalable Multiprocessors. , 1995 .

[36]  Jochen Liedtke,et al.  OS-controlled cache predictability for real-time systems , 1997, Proceedings Third IEEE Real-Time Technology and Applications Symposium.

[37]  S. Turner,et al.  Performance Analysis Using the MIPS R10000 Performance Counters , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[38]  Uli Kutter,et al.  Literatur. , 1941, Subjekt.

[39]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[40]  Armando P. Stettner The design and implementation of the 4.3BSD UNIX operating system , 1988 .

[41]  The Performance Implications of Locality Information Usage in Shared-Memory . . . , 1996 .

[42]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[43]  Fong Pong,et al.  Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[44]  Brian N. Bershad,et al.  Avoiding conflict misses dynamically in large direct-mapped caches , 1994, ASPLOS VI.

[45]  Evangelos P. Markatos,et al.  First-class user-level threads , 1991, SOSP '91.

[46]  Alan Mink,et al.  Multiprocessor performance-measurement instrumentation , 1990, Computer.

[47]  Nawaf Bitar,et al.  A Scalable Multi-Discipline, Multiple-Processor Scheduling Framework for IRIX , 1995, JSSPP.

[48]  Stefan Savage,et al.  Processor Capacity Reserves for Multimedia Operating Systems , 1993 .

[49]  Berny Goodheart,et al.  The magic garden explained - the internals of UNIX System V, release 4: an open systems design , 1994 .

[50]  John P. Lehoczky,et al.  The rate monotonic scheduling algorithm: exact characterization and average case behavior , 1989, [1989] Proceedings. Real-Time Systems Symposium.

[51]  Frank Bellosa,et al.  Locality Information Based Scheduling in Shared Memory Multiprocessors , 1996, JSSPP.

[52]  Josep Torrellas,et al.  Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors , 1995, J. Parallel Distributed Comput..

[53]  Frank Mueller,et al.  Compiler support for software-based cache partitioning , 1995, Workshop on Languages, Compilers, & Tools for Real-Time Systems.

[54]  Michael J. Flynn,et al.  Paging Performance with Page Coloring. , 1991 .

[55]  Andrew Gilliam Tucker,et al.  Efficient Scheduling on Multiprogrammed Shared-Memory Multiprocessors , 1994 .

[56]  Mark S. Squillante,et al.  Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling , 1993, IEEE Trans. Parallel Distributed Syst..

[57]  Christoph Koppe,et al.  Sleeping Threads: A Kernel Mechanism for Support of Efficient User Level Threads , 1995, Parallel and Distributed Computing and Systems.

[58]  Herman H. Goldstine The Computer from Pascal to von Neumann , 1972 .

[59]  Gyula Zathureczky,et al.  Α. EINLEITUNG , 1892, Dinge sammeln.