A unified model and implementation for interprocess communication in a multiprocessor environment

This paper describes interprocess communication and process dispatching on the Intel 432. The primary assets of the facility are its generality and its usefulness in a wide range of applications. The conceptual model, supporting mechanisms, available interfaces, current implementations, and absolute and comparative performance are described. The Intel 432 is an object-based multiprocessor. There are two processor types: General Data Processors (GDPs) and Interface Processors (IPs). These processors provide several operating system functions in hardware by defining and using a number of processor-recognized objects and high-level instructions. In particular, they use several types of processor-recognized objects to provide a unified structure for both interprocess communication and process dispatching. One of the prime motivations for providing this level of hardware support is to improve efficiency of these facilities over similar facilities implemented in software. With greater efficiency, they become more practically useful [Stonebraker 81]. The unification allows these traditionally separate facilities to be described by a single conceptual model and implemented by a single set of mechanisms. The 432 model is based on using objects to play roles. The roles are those of requests and servers. In general, a request is a petition for some service and a server is an agent that performs the requested service. Various types of objects are used to represent role-players. The role played by an object may change over time. The type and state of an object determines what role it is playing at any given instant. For any particular class of request, based upon type and state, there is typically a corresponding class of expected server. The request/server model may be applied to a number of common communication situations. In the full paper, several situations are discussed: one-way requestor to server, two-way requestor to server to requestor, nondistinguished requestors, resource source selectivity, nondistinguished servers, and mutual exclusion. While the model embodies most of the essential aspects of the 432's interprocess communication and process dispatching facilities, it leaves a great many practical questions unanswered. The full paper describes our solutions to those problems which often stand between an apparently good model and a successful implementation, namely: binding, queue structure, queuing disciplines, blocking, vectoring, dispatching mixes, and hardware/software cooperation. With an understanding of the mechanisms employed, the paper then reviews the instruction interface to and potential uses of the port mechanism. This instruction interface is provided by seven instructions: SEND, RECEIVE, CONDITIONAL SEND, CONDITIONAL RECEIVE, SURROGATE SEND, SURROGATE RECEIVE, and DELAY. The implementations of the port mechanism are then discussed. The port mechanism is implemented in microcode on both the GDP and IP. Although the microarchitectures differ, in both cases the implementation requires between 600 and 800 lines of vertically encoded 16-bit microinstructions. The corresponding execution times are roughly comparable, with the IP about 20% slower even though most of its microinstructions are twice as slow. Both implementations resulted from the hand translation of the Ada-based algorithms that describe these operations. Finally, the paper characterizes the performance of the 432 port mechanisms and contrasts its performance to other implementations of similar mechanisms. Three recently implemented mechanisms were chosen: one implemented completely in software (i.e., the Exchange mechanism of RMX/86 [Intel 80]) and two implemented in a combination of hardware and software (i.e., the Star0S and Medusa mechanisms of Cm* [Jones 80]). To make the comparison as fair as possible, times for each system are normalized to account for differences in their underlying hardware. The normalization factor is called a “tick” (similar to [Lampson 80]). In the full paper, the absolute and normalized performance of these implementations is examined in six different cases: conditional send time, conditional receive time, minumum message transit time, send plus minimum dispatching latency time, non-blocking send time, and blocking receive time. These performance comparisons show a 3 to 7x normalized performance advantage over the software implemented RMX/86. They show similar normalized performance to the Cm* implementations.