Self-timed rings and their application to division

Self-timed systems avoid the problems associated with the global clocks of synchronous systems. This thesis introduces a new type of structure called a self-timed ring that can pass data multiple times through the same function blocks without requiring any external control signals or clocking. The latency and throughput of self-timed rings are analyzed by a method that also determines the performance of asynchronous pipelines as a special case. By meeting certain constraints suggested by this analysis, a self-timed ring can completely hide its control logic delays and achieve operation with zero overhead. If, in addition, the ring is composed of a proposed domino stage configuration without latches, then the ring achieves, in much less area, the same minimal-latency operation as an unrolled combinational array implementing the same function. A prime example of a problem for which a self-timed ring implementation achieves high performance is the iterative computation of the arithmetic function of division. This thesis compares two self-timed divider chips: a preliminary design and an improved one that adheres to the constraints determined by this research. Measurements showed that design techniques increased performance by a factor of 2.2 due to architecture alone. The self-timed ring of the new divider occupies 7mm$\sp2$ in a 1.2$\mu$ CMOS process and computes quotient bits in 2.9nS, requiring a total latency of 45nS to 160nS for a full 54-bit result, depending on the data operands.