Leveraging Protocol Knowledge in Slack Matching

Stalls, due to mis-matches in communication rates, are a major performance obstacle in pipelined circuits. If the rate of data production is faster than the rate of consumption, the resulting design performs slower than when the communication rate is matched. This can be remedied by inserting pipeline buffers (to temporarily hold data), allowing the producer to proceed if the consumer is not ready to accept data. The problem of deciding which channels need these buffers (and how many) for an arbitrary communication profile is called the slack matching problem; the optimal solution to this problem has been shown to be NP-complete. In this paper, we present a heuristic that uses knowledge of the communication protocol to explicitly model these bottlenecks, and an iterative algorithm to progressively remove these bottlenecks by inserting buffers. We apply this algorithm to asynchronous circuits, and show that it naturally handles large designs with arbitrarily cyclic and acyclic topologies, which exhibit various types of control choice. The heuristic is efficient, achieving linear time complexity in practice, and produces solutions that (a) achieve up to 60% performance speedup on large media processing kernels, and (b) can either be verified to be optimal, or the approximation margin can be bounded

[1]  Seth Copen Goldstein,et al.  Modeling the Global Critical Path in Concurrent Systems , 2006 .

[2]  Cheng-Kok Koh,et al.  Performance Optimization of Latency Insensitive Systems Through Buffer Queue Sizing of Communication Channels , 2003, ICCAD 2003.

[3]  Baris Taskin,et al.  Delay insertion method in clock skew scheduling , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Seth Copen Goldstein,et al.  C to Asynchronous Dataflow Circuits: An End-to-End Toolflow , 2004 .

[5]  Majid Sarrafzadeh,et al.  Optimal integer delay budgeting on directed acyclic graphs , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[6]  Paul Day,et al.  Four-phase micropipeline latch control circuits , 1996, IEEE Trans. Very Large Scale Integr. Syst..

[7]  Ganesh Gopalakrishnan,et al.  Performance analysis and optimization of asynchronous circuits , 1994, Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[8]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[9]  Michael Kishinevsky,et al.  Performance Analysis Based on Timing Simulation , 1994, 31st Design Automation Conference.

[10]  Peter A. Beerel,et al.  Slack matching asynchronous designs , 2006, 12th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'06).

[11]  Steven Burns Performance Analysis and Optimization of Asynchronous Circuits , 1991 .

[12]  Peter A. Beerel,et al.  Bounding average time separations of events in stochastic timed Petri nets with choice , 1999, Proceedings. Fifth International Symposium on Advanced Research in Asynchronous Circuits and Systems.

[13]  Piyush Prakash,et al.  Slack matching quasi delay-insensitive circuits , 2006, 12th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'06).