Time in Distributed Real-Time Systems

A real-time music system is responsible for deciding what happens when, when each task runs and each message takes effect. This question becomes acute when there are several classes of tasks running and intercommunicating: user interface, control processing, and audio, for example. We briefly examine and classify past approaches and their applicability to distributed systems, then propose and discuss an alternative. The shared access to a sample clock that it requires is not trivial to achieve in a distributed system, so we describe and assess a way to do so. 1 Existing approaches The goal of a real-time music system is to produce certain time-structured data. Ideally, this output’s timing would be exactly as specified by the musician. In practice, the timing is imprecise: the system’s tasks can’t all run when they ought to, nor can they communicate with perfect timing. The music system’s job is to minimize the consequent harm to its output, through its control over how tasks are run and how they communicate. Broadly speaking, there are several approaches to this problem: synchronous, asynchronous, and our forwardsynchronous. Groups of tasks having the same timing requirements can be formalized as zones (Dannenberg and Rubine, 1995); a system might have audio, MIDI, and user-interface zones. Any zone can send data to any other: for example, UI controlling audio, audio rendered on the screen, or audio abstracted to MIDI and echoed to disk. For concreteness we’ll talk about one particularly common path, from a sequencer to a synthesizer. What we call the synchronous approach is exemplified by MAX (Puckette, 1991). This approach interleaves the control and audio zones into a single thread of control, running each to completion. Therefore control updates always apply atomically and to the proper block of audio. Unfortunately, long-running control computation causes audio to underflow. One might prefer to defer this computation, delaying an event onset but preserving audio flow. The asynchronous approach permits this by running audio and control in separate threads, audio having either priority or the use of an independent processor. They communicate through messages or shared memory, updates taking effect immediately. As a result, events may be early or late, depending on what points of execution the two threads have reached. MIDI ensembles and the IRCAM 4X (Favreau et al., 1986) are examples of this approach. It extends to distributed systems, whereas the synchronous approach does not. 2 Forward-synchronous The proposed forward-synchronous approach is so called because it has synchronous timing for events in the future, and asynchronous for those in the past. It permits events to be late, but not early. Control messages are stamped with a time when they are to take effect. Those arriving with a future timestamp are queued, and later dispatched synchronously with audio computation; those with a past timestamp are dispatched immediately. The timestamp refers to a particular sample, because time is based on the DAC clock: time = samples written+ samples buffered =fnom where fnom is the nominal clock frequency. Though the system clock is quite likely more accurate, it must not be used directly: its very accuracy constitutes a lack of synchronization with the DAC, precluding sampleprecise timing. A sequencer can request precise timing (at the price of added latency) by sending with an offset into the future. If the synthesizer runs a little late, the messages are still in the future, still precisely timed. If it runs so late as to overrun the offset, the events start to happen late. The sequencer sets its own tradeoff between latency and the chance of jitter; it can rule out jitter if the synthesizer’s zone has a hard scheduling bound. An accompaniment system, on the other hand, would more likely choose minimal delay and take its chances with jitter. This approach offers for every message the choice of synchronous or asynchronous behavior, with graceful degradation under uncooperative thread scheduling. Distributed operation requires a distributed shared timebase, discussed below. Anderson and Kuivila (1986) also augment the asynchronous model with timestamps, but in a less flexible way. Their approach introduces a fixed latency L between the control and audio zones. Each control message is generated E milliseconds early, which the application cannot schedule but can bound between 0 and L. It is then sent timestamped for E ms into the future, canceling the scheduling jitter. Think of this as running the control zone L ms ahead of the audio zone. In this way we can view all of the timestamps within a globally consistent timeline. (Without this, events are acausal and time becomes deeply confusing.) Each zone is offset into the future by a different amount. Notice that this view requires that zone connectivity be all ‘downhill’, from higher offset to lower. In particular, there can be no cycles. This can be problematic, as when a system hopes to both receive from and send to the outside world. ZIPI messages are timestamped, and the forwardsynchronous approach is one option discussed by its inventors (McMillen et al., 1994). They also discuss yet another model in which time-stamped messages are processed and forwarded early for timely arrival at a final destination. This is not allowed in the forwardsynchronous model, which (except for late messages) always processes messages synchronously according to timestamps. ZIPI does not seem to assume sampleprecise timing and synchronization.