Synthesis and Optimization of Pipelined Packet Processors

We consider pipelined architectures of packet processors consisting of a sequence of simple packet-processing modules interconnected by first-in first-out buffers. We propose a new model for describing their function, an automated synthesis technique that generates efficient hardware for them, and an algorithm for computing minimum buffer sizes that allow such pipelines to achieve their maximum throughput. Our functional model provides a level of abstraction familiar to a network protocol designer; in particular, it does not require knowledge of register-transfer-level hardware design. Our synthesis tool implements the specified function in a sequential circuit that processes packet data a word at a time. Finally, our analysis technique computes the maximum throughput possible from the modules and then determines the smallest buffers that can achieve it. Experimental results conducted on industrial-strength examples suggest that our techniques are practical. Our synthesis algorithm can generate circuits that achieve 40 Gb/s on field-programmable gate arrays, equal to state-of-the-art manual implementations, and our buffer-sizing algorithm has a practically short runtime. Together, our techniques make it easier to quickly develop and deploy high-speed network switches.

[1]  Ieee Standards Board IEEE standards for local and metropolitan area networks : supplement to Carrier Sense Multiple Access with Collision Detection (CSMA/CD) access method and physical layer specifications : layer management (section 5) , 1991 .

[2]  Cheng-Kok Koh,et al.  Performance analysis of latency-insensitive systems , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Stephen A. Edwards,et al.  High level synthesis for packet processing pipelines , 2008 .

[4]  Seth Copen Goldstein,et al.  Global Critical Path: A Tool for System-Level Timing Analysis , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[5]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[6]  Eddie Kohler,et al.  The Click modular router , 1999, SOSP.

[7]  Ning Weng Pipelining vs. Multiprocessors – Choosing the Right Network Processor System Topology , 2004 .

[8]  Michael K. Chen,et al.  A Throughput-Driven Task Creation and Mapping for Network Processors , 2007, HiPEAC.

[9]  Tiziano Villa,et al.  VIS: A System for Verification and Synthesis , 1996, CAV.

[10]  Gerd Keiser,et al.  Local Area Networks , 1989 .

[11]  Wilton R. Abbott,et al.  Network Calculus , 1970 .

[12]  Michael Kishinevsky,et al.  Performance Analysis Based on Timing Simulation , 1994, 31st Design Automation Conference.

[13]  Ganesh Gopalakrishnan,et al.  Performance analysis and optimization of asynchronous circuits , 1994, Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[14]  Mario R. Casu,et al.  Throughput-driven floorplanning with wire pipelining , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Edward D. Willink,et al.  Programming Specifications in CAL , 2002 .

[16]  Kurt Keutzer,et al.  Chapter 13 – Sub-RISC Processors , 2007 .

[17]  Alberto L. Sangiovanni-Vincentelli,et al.  Theory of latency-insensitive design , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[18]  S.C. Goldstein,et al.  Leveraging Protocol Knowledge in Slack Matching , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[19]  Henrik Hulgaard,et al.  Symbolic timing analysis of asynchronous systems , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[20]  Bill Klein,et al.  Agere Systems–Communications Optimized PayloadPlus Network Processor Architecture , 2003 .

[21]  Jean-Yves Le Boudec,et al.  Network Calculus: A Theory of Deterministic Queuing Systems for the Internet , 2001 .

[22]  Tony Li MPLS and the evolving Internet architecture , 1999, IEEE Commun. Mag..

[23]  Dirk Grunwald,et al.  CUSP: a modular framework for high speed network applications on FPGAs , 2005, FPGA '05.

[24]  Alain J. Martin,et al.  Slack Elasticity in Concurrent Computing , 1998, MPC.

[25]  Richard M. Karp,et al.  A characterization of the minimum cycle mean in a digraph , 1978, Discret. Math..

[26]  Cheng-Kok Koh,et al.  Performance Optimization of Latency Insensitive Systems Through Buffer Queue Sizing of Communication Channels , 2003, ICCAD 2003.

[27]  Peter A. Beerel,et al.  Pipeline optimization for asynchronous circuits: complexity analysis and an efficient optimal algorithm , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[28]  Ali Dasdan,et al.  Experimental analysis of the fastest optimum cycle ratio and mean algorithms , 2004, TODE.

[29]  Praveen K. Murthy,et al.  Buffer merging—a powerful technique for reducing memory requirements of synchronous dataflow specifications , 2004, TODE.

[30]  Gordon J. Brebner,et al.  Hyper-programmable architectures for adaptable networked systems , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[31]  Gordon J. Brebner,et al.  Hyper-programmable architectures for adaptable networked systems , 2004 .

[32]  Piyush Prakash,et al.  Slack matching quasi delay-insensitive circuits , 2006, 12th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'06).

[33]  Peter A. Beerel,et al.  Bounding average time separations of events in stochastic timed Petri nets with choice , 1999, Proceedings. Fifth International Symposium on Advanced Research in Asynchronous Circuits and Systems.

[34]  Gordon J. Brebner,et al.  Mapping a domain specific language to a platform FPGA , 2004, Proceedings. 41st Design Automation Conference, 2004..

[35]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[36]  Douglas Comer,et al.  Network Systems Design Using Network Processors , 2003 .

[37]  Peter A. Beerel,et al.  Slack matching asynchronous designs , 2006, 12th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'06).