Towards User-level Parallelism with Minimal Kernel Support on Mach

In order to reach a kernel and application agreement in scheduling decisions, we believe that it is necessary to reduce the overhead in kernel scheduling and move policies up from the kernel to the user level (libraries and subsystems). Following this trend, we have simplified the scheduler in Mach 3.0 and have offered some kernel mechanisms (priorities, context switch, preemption, handoff, etc) as library routines. A CPU server is also implemented. We have compared the performance between the original Mach and our modified version using the SPLASH -that we have ported from UNIX to Machand WPI benchmarks. 0. INTRODUCTION In the microkernel architecture design new layers in scheduling decisions appear. The applications have to deal with many entities (kernel, libraries and subsystems), therefore if performance must be maintained or improved the overhead in the kernel scheduling shall be reduced. Up to now, the kernel was responsible for thread scheduling. Nowadays there are also libraries that help the user level scheduling. This control, when done independently in both layers, may end in a global low performance. Our main goal is to allow scheduling decisions to be taken at the right place, both at kernel and user level. We focus on parallel applications running on multiprocessors. Applications know which execution flows may run and which synchronization mechanisms are needed [MARS91]. The kernel manages efficiently processors and memory. The experiences related in this work are some steps towards what it seems to become the common CPU allocation policy: a one to one mapping of kernel threads to physical processors [MARK92a]. This lets the application make its decisions about which user-level thread has to run at every moment on the virtual processors given by the kernel. The work done by Brian Bershad and Thomas Anderson [ANDE90] demonstrates that user-level threads built on top of kernel threads offer good performance for commonly used operations such as creation, termination and synchronization; but exhibits poor performance or even incorrect behavior when dealing with blocking kernel operations such as I/O, page faults and processor preemption. These authors designed a new kernel entity: the scheduler activation. It is an alternative to kernel threads, a vessel on which a user-level thread scheduler controls which threads run. The kernel allocates processors to jobs. Scheduler activations are the mechanisms by which the kernel provides processors to tasks. The kernel never time-slices scheduler activations and it is responsible for notifying the user-level scheduler about system events that affect the application: blocking and unblocking of scheduler activations into the kernel, preemption and allocation of processors to the current task. Processing does not stop on an I/O or preemption block, but instead control is returned to the user-level scheduler. This work has been supported by the Ministry of Education of Spain (CICYT) under contract TIC 89/392. OSF RI Workshop, June 93 Scheduler activations are being ported to Mach 3.0 by the University of Washington (Seattle) [BART92]. We agree that they provide a good framework for supporting the user-level management of parallelism. Our work tends to adopt this mechanism but by now has been focused on locating blocking points and context switch in the Mach kernel, besides we have ported and adapted a new event driven scheduling policy of kernel threads, and exploited and extent the user-level context switch of the Cthreads package. Our Mach scheduler modifications are based on the ESCHED proposals [STRA86]. A simpler scheduler with less overhead in the kernel has been implemented without loosing throughput. 1. CURRENT KERNEL SCHEDULING 1.1. Traditional Schedulers UNIX process schedulers operate with a priority multilevel queue and a one second system quantum. The priority of a process is based on its compute-to-real-time ratio, favoring interactive processes. Priority calculation in current UNIX schedulers occurs more frequently than in the original one. It is also more complex and considers more parameters. The clock interrupt handler recalculates priorities every 10 ms (the quantum value). There are a mixture of routines involved in priority assignment: schedcpu() is a callout that recalculates priorities for all processes every second; wakeup() always forces a rescheduling and, if the process awakened was blocked for more than a second, computes its new priority; and roundrobin() that is the callout that computes priorities when the quantum is exhausted. There are few possibilities to adjust process priorities from the user level (only the nice() system call). The Mach 3.0 thread scheduler mechanisms draws from Mach 2.5 (based on 4.3 BSD), a monolithic version with UNIX integrated inside. However, the Mach scheduler has been conceived for multiprocessor systems. Some optimizations have been made in the scheduler design (handoff, hint,...) and implementation (run queue hint and counter, etc.) [BLAC90] [DRAV91]. Nevertheless, priority calculation is still quite complex and difficult to be controlled by the applications. 1.2. Event-driven Scheduling (ESCHED) In order to simplify and eliminate the shortcomings of UNIX schedulers, a new scheduler for UNIX was designed and implemented at the University of Maryland: ESCHED [STRA86]. The explicit goals of this approach were to maximize the system response time, system throughput and to permit adaptability, where maximizing responsiveness took priority over the latter two. In ESCHED, implemented on UNIX 4.3BSD, the CPU run queue is of the multilevel feedback type. A new process can obtain the CPU in one of the following situations: the running process decides to block itself, a higher priority process enters the run queue, or the running process finishes its quantum and there is another process with higher or equal priority in the run queue. The quantum of a process begins when the process is removed from the run queue and takes control of the CPU. Higher priority processes are in the upper levels of the run queue and will be executed earlier than the lower priority ones. The priority of a process is increased when it performs some kind of interactive activity. The device drivers are in charge of priority boosts as long as the operation blocks. A big responsibility is laid on device driver programmers. When a process quantum expires, its priority is lowered. Every possible priority has an associated quantum. This means that quantum depends directly on priorities. In order to decrease the amount of context switches, higher priorities (where I/O-bound processes live) have smaller quantum and lower priorities (CPU-bound processes) longer ones.