A simple protocol for latency-insensitive design is presented. The main features of the protocol are the efficient implementation of elastic communication channels and the automatable design methodology. A latch-based implementation with no storage overhead is also proposed. With this approach, fine-granularity elasticity can be introduced at the level of functional units (e.g. ALUs, memories). A formal specification of the protocol is defined and several schemes for the implementation of elasticity are discussed. The opportunities that this protocol opens for microarchitectural design are illustrated with several examples. I. M OTIVATION The time discretization imposed by synchronicity forces to take early decisions that often complicate changes at the latest stages of the design or efficient migrations to scaled technologies. In DSM technologies, calculating the number of cycles required to transmit an event from a sender to a receiver is a problem that cannot be solved until the final layout has been generated. Some researchers advocate for the modularity and efficiency of asynchronous circuits to devise some kind of object-oriented methodology for complex systems. However, the CAD support for asynchronous circuits is still in its pre-history. The question we want to answer in this paper is: can we find an efficient scheme that combines the modularity of asynchronous systems with the simplicity of synchronous implementations?. Other authors have been working into this direction. Latencyinsensitive ( LI) schemes [CMSV01] were proposed to separate communication from computation and make the systems insensitive to the latencies of the computational units and channels. The implementation of LI systems is synchronous [CSV02], [CN01] and uses relay stationsat the interfaces between computational units. In a different scenario, synchronous interlocked pipelines [JKB02] were proposed to achieve fine grained local handshaking at the level of stages. The implementation is conceptually similar to a discretized version of traditional asynchronous pipelines with req/ack handshake signals. A de-synchronization[HDGC04], [BCK04] approach automatically transforms synchronous specifications into asynchronous implementations by replacing the clock network with an asynchronous controller. The success of this paradigm will mainly depend on the attitude of designers towards accepting asynchrony in their design flow. A. Contributions of the paper The main contributions of the paper are as follows: • A simple and efficient protocol for latency-insensitive design and an abstract model for elastic channels and buffers. • Demonstration of several architectures and control schemes for the implementation of elastic buffers and channels. Implementations ofLI systems proposed in [CSV02] and interlock pipelines in [JKB02] are two particular solutions in this design space. • An efficient latch-based implementation with no storage overhead, clock-gating of all sequential elements and eager forks. • We demonstrate that the proposed scheme can be applied on different levels of system granularity and both in the whitebox (e.g. microprocessor design) and black-box scenarios (SoC IPs). Contrary to [CMSV01] the elastic system before inserting additional delays has the same sequential latency as the original synchronous design. • A design methodology with the automatic correct-byconstruction transformation of a synchronous system into an elastic one and the analytical performance analysis. • Sequential optimization of the controllers. II. T HE STRUCTURE OF AN ELASTIC SYSTEM Intuitively, an elastic design is a collection of elastic modules and elastic channels. Every channel can propagate data from one module to another. As it will be discussed in Section III, channels have control wires implementing a handshake between the sender and the receiver. For simplicity in the explanation, we will initially assume that elastic modules are partitioned into combinational blocks, to do computations, and sequential elements, to store and propagate the results of the computations. In section VII we will show the generalization to modules with fixed and variable sequential latencies. In the low granularity elastic design, all flip-flops are replaced with Elastic Buffers(EB). EBs can be composed of Elastic Half-Buffers, (EHB), in the same fashion as flip-flops can be implemented as a pair of two transparent latches with opposite polarity (master and slave).Thus, a designer of an elastic system has a choice between using edge-triggered or transparent elements. Depending on the state of the associated control wires, a channel can carry valid or invalid data items. For simplicity, we will talk abouttokensandbubbles, respectively. III. SPECIFICATION OF THESELF PROTOCOL This section describes an elastic protocol called SELF (Synchronous ELastic Flow) that can be implemented with the following features: • Small control overhead that can be effectively used at the level of medium-grain blocks (ALUs, shifters, register files, etc). • Scalable in such a way that the delay overhead of the protocol is independent from the size of the system. • A method for design automation to transform conventional synchronous designs into elastic systems. Fig. 1 depicts an example of an elastic implementation for transmitting data between two units. Each register has an associated v lid bit (V ) that keeps track of the validity of the stored data. The clock ignal is not explicitly shown and the enable signal ( En) indicates when new data is stored into the register. The chain of AND gates
[1]
Alberto L. Sangiovanni-Vincentelli,et al.
Combining retiming and recycling to optimize the performance of synchronous circuits
,
2003,
16th Symposium on Integrated Circuits and Systems Design, 2003. SBCCI 2003. Proceedings..
[2]
Alberto L. Sangiovanni-Vincentelli,et al.
Coping with Latency in SOC Design
,
2002,
IEEE Micro.
[3]
Jean-Christophe Le Lann,et al.
POLYCHRONY for System Design
,
2003,
J. Circuits Syst. Comput..
[4]
Luca Benini,et al.
Automatic Synthesis of Large Telescopic Units Based on Near-Minimum Timed Supersetting
,
1999,
IEEE Trans. Computers.
[5]
Cheng-Kok Koh,et al.
Performance optimization of latency insensitive systems through buffer queue sizing of communication channels
,
2003,
ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).