SELF : Specification and design of synchronous elastic circuits

A simple protocol for latency-insensitive design is presented. The main features of the protocol are the efficient implementation of elastic communication channels and the automatable design methodology. A latch-based implementation with no storage overhead is also proposed. With this approach, fine-granularity elasticity can be introduced at the level of functional units (e.g. ALUs, memories). A formal specification of the protocol is defined and several schemes for the implementation of elasticity are discussed. The opportunities that this protocol opens for microarchitectural design are illustrated with several examples. I. M OTIVATION The time discretization imposed by synchronicity forces to take early decisions that often complicate changes at the latest stages of the design or efficient migrations to scaled technologies. In DSM technologies, calculating the number of cycles required to transmit an event from a sender to a receiver is a problem that cannot be solved until the final layout has been generated. Some researchers advocate for the modularity and efficiency of asynchronous circuits to devise some kind of object-oriented methodology for complex systems. However, the CAD support for asynchronous circuits is still in its pre-history. The question we want to answer in this paper is: can we find an efficient scheme that combines the modularity of asynchronous systems with the simplicity of synchronous implementations?. Other authors have been working into this direction. Latencyinsensitive ( LI) schemes [CMSV01] were proposed to separate communication from computation and make the systems insensitive to the latencies of the computational units and channels. The implementation of LI systems is synchronous [CSV02], [CN01] and uses relay stationsat the interfaces between computational units. In a different scenario, synchronous interlocked pipelines [JKB02] were proposed to achieve fine grained local handshaking at the level of stages. The implementation is conceptually similar to a discretized version of traditional asynchronous pipelines with req/ack handshake signals. A de-synchronization[HDGC04], [BCK04] approach automatically transforms synchronous specifications into asynchronous implementations by replacing the clock network with an asynchronous controller. The success of this paradigm will mainly depend on the attitude of designers towards accepting asynchrony in their design flow. A. Contributions of the paper The main contributions of the paper are as follows: • A simple and efficient protocol for latency-insensitive design and an abstract model for elastic channels and buffers. • Demonstration of several architectures and control schemes for the implementation of elastic buffers and channels. Implementations ofLI systems proposed in [CSV02] and interlock pipelines in [JKB02] are two particular solutions in this design space. • An efficient latch-based implementation with no storage overhead, clock-gating of all sequential elements and eager forks. • We demonstrate that the proposed scheme can be applied on different levels of system granularity and both in the whitebox (e.g. microprocessor design) and black-box scenarios (SoC IPs). Contrary to [CMSV01] the elastic system before inserting additional delays has the same sequential latency as the original synchronous design. • A design methodology with the automatic correct-byconstruction transformation of a synchronous system into an elastic one and the analytical performance analysis. • Sequential optimization of the controllers. II. T HE STRUCTURE OF AN ELASTIC SYSTEM Intuitively, an elastic design is a collection of elastic modules and elastic channels. Every channel can propagate data from one module to another. As it will be discussed in Section III, channels have control wires implementing a handshake between the sender and the receiver. For simplicity in the explanation, we will initially assume that elastic modules are partitioned into combinational blocks, to do computations, and sequential elements, to store and propagate the results of the computations. In section VII we will show the generalization to modules with fixed and variable sequential latencies. In the low granularity elastic design, all flip-flops are replaced with Elastic Buffers(EB). EBs can be composed of Elastic Half-Buffers, (EHB), in the same fashion as flip-flops can be implemented as a pair of two transparent latches with opposite polarity (master and slave).Thus, a designer of an elastic system has a choice between using edge-triggered or transparent elements. Depending on the state of the associated control wires, a channel can carry valid or invalid data items. For simplicity, we will talk abouttokensandbubbles, respectively. III. SPECIFICATION OF THESELF PROTOCOL This section describes an elastic protocol called SELF (Synchronous ELastic Flow) that can be implemented with the following features: • Small control overhead that can be effectively used at the level of medium-grain blocks (ALUs, shifters, register files, etc). • Scalable in such a way that the delay overhead of the protocol is independent from the size of the system. • A method for design automation to transform conventional synchronous designs into elastic systems. Fig. 1 depicts an example of an elastic implementation for transmitting data between two units. Each register has an associated v lid bit (V ) that keeps track of the validity of the stored data. The clock signal is not explicitly shown and the enable signal ( En) indicates when new data is stored into the register. The chain of AND gates manages theback-pressuregenerated by the receiver when it is not able to accept data ( Stop= 1). The scheme in Fig 1 is not scalable due to the long combinational path from the receiver to the sender. When the pipeline is full, i.e. all V ’s are at 1, the delay of the Stopchain becomes critical.

[1]  Kenneth L. McMillan,et al.  Verification of Infinite State Systems by Compositional Model Checking , 1999, CHARME.

[2]  Luciano Lavagno,et al.  Handshake protocols for de-synchronization , 2004, 10th International Symposium on Asynchronous Circuits and Systems, 2004. Proceedings..

[3]  David L. Dill,et al.  Polynomial-time techniques for approximate timing analysis of asynchronous systems , 1998 .

[4]  Sandeep K. Shukla,et al.  Presentation and Formal Verification of a Family of Protocols for Latency Insensitive Design , 2005 .

[5]  Alberto L. Sangiovanni-Vincentelli,et al.  Coping with Latency in SOC Design , 2002, IEEE Micro.

[6]  P. R. Stephan,et al.  SIS : A System for Sequential Circuit Synthesis , 1992 .

[7]  Cheng-Kok Koh,et al.  Performance Optimization of Latency Insensitive Systems Through Buffer Queue Sizing of Communication Channels , 2003, ICCAD 2003.

[8]  Luca Benini,et al.  Automatic Synthesis of Large Telescopic Units Based on Near-Minimum Timed Supersetting , 1999, IEEE Trans. Computers.

[9]  Alberto L. Sangiovanni-Vincentelli,et al.  Theory of latency-insensitive design , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[10]  Pradip Bose,et al.  Synchronous interlocked pipelines , 2002, Proceedings Eighth International Symposium on Asynchronous Circuits and Systems.

[11]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[12]  Fred Kröger,et al.  Temporal Logic of Programs , 1987, EATCS Monographs on Theoretical Computer Science.

[13]  Tadao Murata,et al.  Petri nets: Properties, analysis and applications , 1989, Proc. IEEE.

[14]  Alberto L. Sangiovanni-Vincentelli,et al.  Combining retiming and recycling to optimize the performance of synchronous circuits , 2003, 16th Symposium on Integrated Circuits and Systems Design, 2003. SBCCI 2003. Proceedings..

[15]  Madhav P. Desai,et al.  A novel technique towards eliminating the global clock in VLSI circuits , 2004, 17th International Conference on VLSI Design. Proceedings..

[16]  Jean-Christophe Le Lann,et al.  POLYCHRONY for System Design , 2003, J. Circuits Syst. Comput..

[17]  Steven M. Nowick,et al.  Robust interfaces for mixed-timing systems with application to latency-insensitive protocols , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).