Performance of iterative computation in self-timed rings

The computation of iterative functions need not be limited by the rate at which external signals, such as a clock, can be supplied to an on-chip circuit. Instead, self-timed structures can compute without clock or latch delays. In particular, a self-timed ring is a loop of logical stages that, after initialization with operands, computes multiple cycles of an iterative computation without further external handshaking. Viewed as a whole, a self-timed ring has a total latency and throughput dependent not only on the individual stages' latencies and cycle times, but also on the total number of stages, tokens, and extra “bubbles” in the ring. This article derives the performance characteristics of self-timed rings, illustrates them with graphs, and discusses the implications for designing rings with optimal performance. Certain suggested ring configurations allow iteration with no latches and zero delay overhead, achieving a total latency equal to just the sum of the raw function-block delays. This property has been verified by measurements on a chip that demonstrates a self-timed ring for the example function of floating-point division. Fabricated in 1.2Μ CMOS, the ring occupies 7 mm2 and generates a quotient bit every 2.8 ns.